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TITLE OF THE INVENTION 

5 CHROMOSOME 1-LINKED PROSTATE CANCER 

SUSCEPTIBILITY GENE AND MULTISITE TUMOR SUPPRESSOR 

FTEj t> OF THF. INVENTION 

The present invention relates generally to the field of human genetics. Specifically, the 

10 present invention relates to methods and materials used to isolate and detect a human prostate 
cancer predisposing gene (HPC1), some mutant alleles of which cause susceptibility to cancer, 
in particular, prostate cancer. More specifically, the invention relates to germline mutations in 
the HPC1 gene and their use in the diagnosis of predisposition to prostate cancer. The present 
invention further relates to somatic mutations in the HPC1 gene in human prostate cancer and 

15 their use in the diagnosis and prognosis of human prostate cancer. Additionally, the invention 
relates to somatic mutations in the HPC1 gene in other human cancers and their use in the 
diagnosis and prognosis of human cancers. The invention also relates to the therapy of human 
cancers which have a mutation in the HPC1 gene, including gene therapy, protein replacement 
therapy and protein mimetics. The invention further relates to the screening of drugs for cancer 

20 therapy. Finally, the invention relates to the screening of the HPC1 gene for mutations, which 
are useful for diagnosing the predisposition to prostate cancer. 

The publications and other materials used herein to illuminate the background of the 
invention, and in particular, cases to provide additional details respecting the practice, are 
incorporated herein by reference, and for convenience, are referenced by author and date in the 

25 following text and respectively grouped in the appended List of References. 



30 



BACKGROUND OF THE INVENTION 

The genetics of cancer is complicated, involving gain or loss of function of three loosely 
defined classes of genes: (1) dominant, positive regulators of the transformed state (oncogenes); 
(2) recessive, negative regulators of the transformed state (tumor suppressor genes); (3) recessive 
genes involved in maintenance of genome integrity (caretaker genes) (Kinzler and Vogelstein, 
1997). Over one hundred oncogenes have been characterized. About a dozen tumor suppressor 
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and a similar number of caretaker genes have been identified; the number of genes falling into 
these last two classes is expected to increase beyond fifty (Knudson, 1993). 

The involvement of so many genes underscores the complexity of the growth control 
mechanisms that operate in cells to maintain the integrity of normal tissue. This complexity is 
5 manifest in another way. So far, no single gene has been shown to participate in the 
development of all, or even the majority of, human cancers. The most common oncogenic 
mutations are in the H-ras gene, found in 10-15% of all solid tumors (Anderson et ai 9 1992). 
The most frequently mutated tumor predisposition genes are the TP53 gene, homozygously 
deleted or mutated in roughly 50% of all tumors, and CDKN2, which was homozygously deleted 
10 in 46% of tumor cell lines examined (Kamb et al 9 1994). Without a target that is common to all 
transformed cells, the dream of a "magic bullet" that can destroy or revert cancer cells while 
leaving normal tissue unharmed is improbable. The hope for a new generation of specifically 
targeted antitumor drugs may rest on the ability to identify oncogenes, tumor suppressor, and 
caretaker genes that play general roles in the process of oncogenesis. 
15 Specific germline alleles of certain oncogenes, tumor suppressor, and caretaker genes are 

causally associated with predisposition to cancer. This 'set of genes is referred to as tumor 
predisposition genes. Some of the tumor predisposition genes which have been cloned and 
characterized influence susceptibility to: 1) Retinoblastoma (RBI); 2) Wilms' tumor (WT1); 3) 
Li-Fraumeni (TP53); 4) Familial adenomatous polyposis (APC); 5) Neurofibromatosis type 1 
20 (NF1); 6) Neurofibromatosis type 2 (NF2); 7) von Hippel-Lindau syndrome (VHL); 8) Multiple 
endocrine neoplasia type 2A (MEN2A); 9) Melanoma (CDKN2 and CDK4); 10) Breast and 
ovarian cancer (BRCA1 and BRCA2); 11) Cowden disease (MMAC1); 12) Multiple endocrine 
neoplasia (MEN 1); 13) Nevoid basal cell carcinoma syndrome (PTC); 14) Tuberous sclerosis 2 
(TSC2); 15) Xeroderma pigmentosum (genes involved in nucleotide excision repair); 16) 
25 Hereditary nonpolyposis colorectal cancer (genes involved in mismatch repair). 

Tumor predisposition loci that have been mapped genetically but not yet isolated include 
genes for: Lynch cancer family syndrome 2 (LCFS2); Neuroblastoma (NB); Beckwith- 
Wiedemann syndrome (BWS); Renal cell carcinoma (RCC); and Tuberous sclerosis 1 (TSC1). 
Tumor predisposition genes that have been characterized to date encode products with 
30 similarities to a variety of protein types, including DNA binding proteins (WT1), ancillary 
transcription regulators (RBI), GTPase activating proteins or GAPs (NF1), cytoskeletal 
components (NF2), membrane bound receptor kinases (MEN2A), cell cycle regulators (CDKN2 
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and CDK4), tyrosine phosphatases (MMAC1), as well as others with no obvious similarity to 
proteins of known function (BRCA2). 

In many cases, the tumor predisposition gene originally identified through genetic studies 
has been shown to be lost or mutated in some sporadic tumors. This result suggests that regions 
5 of chromosomal aberration, whether germline, in tumors, or in tumor cell lines, may signify the 
position of important tumor predisposition genes involved both in genetic predisposition to 
cancer and in sporadic cancer. 

One of the hallmarks of several tumor suppressor and caretaker genes characterized to 
date is that their function is lost at high frequency in certain tumor types. Loss of function is 
10 often a consequence of deletion. The deletions can involve loss of a single allele, a so-called 
loss of heterozygosity (LOH), but may also involve homozygous deletion of both alleles. For 
LOH, the remaining allele is presumed to be nonfunctional, either because of a preexisting 
inherited mutation or because of a secondary sporadic mutation. Conversely, a number of 
oncogenes are subject to gain of function in certain tumor types. Gain of function often involves 
15 amplification of the copy number of a gene but may also occur when a chromosomal 
translocation generates a chimeric .gene. Gain of function can also be a consequence of point 
mutations that alter some aspect of gene regulation or function. 

Prostate cancer is the most common cancer in men in many western countries, and the 
second leading cause of cancer deaths in men. It accounts for more than 40,000 deaths in the US 
20 annually. The number of deaths is likely to continue rising over the next 1 0 to 15 years. In the 
US, prostate cancer is estimated to cost $1.5 billion per year in direct medical expenses. In 
addition to the burden of suffering, it is a major public-health issue. Numerous studies have 
provided evidence for familial clustering of prostate cancer, indicating that family history is a 
major risk factor for this disease (Cannon et al., 1982; Steinberg et aL, 1990; Carter et al, 1993). 
25 Prostate cancer has long been recognized to be, in part, a familial disease. Numerous 

investigators have examined the evidence for genetic inheritance and concluded that the data are 
most consistent with dominant inheritance for a major susceptibility locus or loci. Woolf 
(1960), described a relative risk of 3.0 of developing prostate cancer among first-degree relatives 
of prostate cancer cases in Utah using death certificate data. Relative risks ranging from 3 to 1 1 
30 for first-degree relatives of prostate cancer cases have been reported (Cannon et al., 1982; 
Woolfi 1960; Fincham et al., 1990; Meikle et al., 1985; Krain, 1974; Morganti et al., 1956; 
Goldgar et al., 1994). Carter et al. (1992) performed segregation analysis on families ascertained 
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through a single prostate cancer proband. The analysis suggested Mendelian inheritance in a 
subset of families through autosomal dominant inheritance of a rare (q=0.003), high-risk allele 
with estimated cumulative risk of prostate cancer for carriers of 88% by age 85. Inherited 
prostate cancer susceptibility accounted for a significant proportion of early-onset disease, and 

5 overall was responsible for 9% of prostate occurrence by age 85. Recent results demonstrate 
that at least two loci exist which convey susceptibility to prostate cancer as well as other cancers. 
These loci are HPC1 on chromosome 1, (Smith et al., 1996), and one or more loci responsible 
for the unmapped residual. 

Smith et al., (1996) indicated that the inherited prostate susceptibility in kindreds with 

1 0 early age onset is linked to chromosome 1 . Most strategies for cloning the chromosome 1 -linked 
prostate cancer predisposing gene (HPC1) require precise genetic localization studies. The 
simplest model for the functional role of HPC1 holds that alleles of HPC1 that predispose to 
cancer are recessive to wild type alleles; that is, cells that contain at least one wild type HPC1 
allele are not cancerous. However, cells that contain one wild type HPC1 allele and one 

15 predisposing allele may occasionally suffer loss of the wild type allele either by random 
mutation or by chromosome loss during cell division (nondisjunction). All the progeny of such 
a mutant cell lack the wild type function of HPC1 and may develop into tumors. According to 
this model, predisposing alleles of HPC1 are recessive, yet susceptibility to cancer is inherited in 
a dominant fashion: men who possess one predisposing allele (and one wild type allele) risk 

20 developing cancer, because their prostate cells may spontaneously lose the wild type HPC1 
allele. This model applies to both tumor suppressor and caretaker genes described above. By 
inference this model may also explain the HPC1 function, as has recently been suggested (Smith 
e/ a/., 1996). 

A second possibility is that HPC1 predisposing alleles are truly dominant; that is, a wild 
25 type allele of HPC1 cannot overcome the tumor-forming role of the predisposing allele. Thus, a 
cell that carries both wild type and mutant alleles would not necessarily lose the wild type copy 
of HPC1 before giving rise to malignant cells. Instead, prostate cells in predisposed individuals 
would undergo some other stochastic change(s) leading to cancer. 

If HPC1 predisposing alleles are recessive, the HPC1 gene is expected to be expressed in 
30 normal prostate tissue but not functionally expressed in prostate tumors. In contrast, if HPC1 
predisposing alleles are dominant, the wild type HPC1 gene may or may not be expressed in 
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normal prostate tissue.. However, the predisposing allele will likely be expressed in prostate 
tumor cells. 

Evidence for a prostate cancer susceptibility locus (HPC1) on the long arm of 
chromosome 1, which is hypothesized to explain approximately 35% of families, was recently 

5 presented (Smith et al., 1996). Although several groups report evidence supporting this 
localization, it has not yet been confirmed statistically. Both the original Smith et al. report and 
a subsequent analysis of additional families (Cooney et al., 1997), suggest that the bulk of 
linkage evidence comes from African- American high-risk kindreds. In addition, it appears that 
this gene predisposes (although not exclusively) primarily to early onset prostate cancer. 

10 The chromosome 1 linkage of HPC1 has not been statistically confirmed; however, a 

report by Cooney et al. (1997) as well as a manuscript in review (Neuhausen et al., in press) are 
suggestive of confirmation, with less-than-significant indications of linkage at the location 
suggested to harbor HPC1 . We have localized the HPC1 gene to an approximately 1 cM interval 
bounded by J23 and El 5. A multipoint analysis of all 22 Utah kindreds resulted, in a 

1 5 heterogeneity lod score of +1 .20 at D1S254, with an estimate that 5% of kindreds are linked. 
This analysis excluded linkage for an alpha greater than .33. We have a set of 5 Utah kindreds 
showing evidence of a common segregating haplotype surrounding D1S254 which themselves 
define a region of less than 1 cM in which HPC 1 must lie. 

Identification of a prostate cancer susceptibility locus would permit the early detection of 

20 susceptible individuals and greatly increase our ability to understand the initial steps which lead 
to cancer. As susceptibility loci are often altered during tumor progression, cloning these genes 
could also be important in the development of better diagnostic and prognostic products, as well 
as better cancer therapies. 

v. 

25 SUMMARY OF THR INVENTION 

The present invention relates generally to the field of human genetics. Specifically, the 
present invention relates to methods and materials used to isolate and detect a human prostate 
cancer predisposing gene (HPC 1), some alleles of which cause susceptibility to cancer, in 
particular prostate cancer. More specifically, the present invention relates to germline mutations 
30 in the HPC1 gene and their use in the diagnosis of predisposition to prostate cancer. The 
invention also relates to presymptomatic therapy of individuals who carry deleterious alleles of 
the HPC1 gene. The invention further relates to somatic mutations in the HPC1 gene in human 



BMSOOClb: <WO 0012694A1J_> 



WO 00/12694 PC17US99/1 9508 

6 

prostate cancer and their use in the diagnosis and prognosis of human prostate cancer. 
Additionally, the invention relates to somatic muiations in the HPC1 gene in other human 
cancers and their use in the diagnosis and prognosis of human cancers. The invention also 
relates to the therapy of human cancers which have a mutation in the HPC1 gene, (including 
5 gene therapy, protein replacement therapy, protein mimetics, and inhibitors). The invention also 
relates to presymptomatic therapy of individuals who carry deleterious alleles of the HPS gene. 
The invention further relates to the screening of drugs for cancer therapy. Finally, the invention 
relates to the screening of the HPC1 gene for mutations, which are useful for diagnosing the 
predisposition to prostate cancer. 

10 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a diagram showing the order of genetic markers neighboring HPC1, a 
schematic map of YACs spanning the HPC1 region, a schematic map of BACs and PACs 
spanning the HPC1 region, and also shows the location of the HPC1 gene within the genetically 
1 5 defined interval. 

Figure 2 is a diagram of the HPC1 transcription unit showing the locations of the exons 
of HPC1 relative to the BAC/PAC contig and relative to each other. The individual exons are 
numbered, and these numbers correspond to their SEQ ID NOs. 

Figure 3 shows multipoint Lod scores for the prostate cancer susceptibility locus relative 
20 to the lq24-25 markers. A walking three point analysis with markers D1S2883, D1S254 and 
D1S412 is plotted as a function of distance from D1S2883. The combined values are plotted 
assuming all kindreds are linked, and with a heterogeneity estimate of 0.05 (estimated from 
HOMOG). The maximum Lod score under heterogeneity is 1 .20 at D 1 S254. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention relates generally to the field of human genetics. Specifically, the 
present invention relates to methods and materials used to isolate and detect a human prostate 
cancer predisposing gene (HPC1), some alleles of which cause susceptibility to cancer, in 
particular prostate cancer. More specifically, the present invention relates to germline mutations 
in the HPC1 gene and their use in the diagnosis of predisposition to prostate cancer. The 
invention also relates to presymptomatic therapy of individuals who carry deleterious alleles of 
the HPC1 gene. The invention further relates to somatic mutations in the HPC1 gene in human 
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prostate cancer and their use in the diagnosis and prognosis of human prostate cancer. 
Additionally, the invention relates to somatic mutations in the HPC1 gene in other human 
cancers and their use in the diagnosis and prognosis of human cancers. The invention also 
relates to the therapy of human cancers which have a mutation in the HPC1 gene, (including 
5 gene therapy, protein replacement therapy, protein mimetics, and inhibitors). The invention also 
relates to presymptomatic therapy of individuals who carry deleterious alleles of the HPS gene. 
The invention further relates to the screening of drugs for cancer therapy. Finally, the invention 
relates to the screening of the HPC1 gene for mutations, which are useful for diagnosing the 
predisposition to prostate cancer. 
1 0 The present invention provides an isolated polynucleotide comprising all, or a portion of 

the HPC1 locus or of a mutated HPC1 locus, preferably at least eight bases and not more than 
about 300 kb in length. Such polynucleotides may be antisense polynucleotides. The present 
invention also provides a recombinant construct comprising such an isolated polynucleotide, for 
example, a recombinant construct suitable for expression in a transformed host cell. 
15 Also provided by the present invention are methods of detecting a polynucleotide 

comprising a portion of the HPC1 locus or its expression product in an analyte. Such methods 
may further comprise the step of amplifying the portion of the HPC1 locus, and may further 
include a step of providing a set of polynucleotides which are primers for amplification of said 
portion of the HPC1 locus. The method is useful for either diagnosis of the predisposition to 
20 cancer or the diagnosis or prognosis of cancer. 

The present invention also provides isolated antibodies, preferably monoclonal 
antibodies, which specifically bind to an isolated polypeptide comprised of at least five amino 
acid residues encoded by the HPC1 locus. 

The present invention also provides kits for detecting in an analyte a polynucleotide 
25 comprising a portion of the HPC1 locus, the kits comprising a polynucleotide complementary to 
the portion of the HPC1 locus packaged in a suitable container, and instructions for its use. 

The present invention further provides methods of preparing a polynucleotide comprising 
polymerizing nucleotides to yield a sequence comprised of at least eight consecutive nucleotides, 
of the HPC1 locus; and methods of preparing a polypeptide comprising polymerizing amino 
30 acids to yield a sequence comprising at least five amino acids encoded within the HPC 1 locus. 

The present invention further provides methods of screening the HPC1 gene to identify 
mutations. Such methods may further comprise the step of amplifying a portion of the HPC1 
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locus, and may further include a step of providing a set of polynucleotides which are primers for 
amplification of said portion of the HPC1 locus. Such methods may also include a step of 
providing the complete set of short polynucleotides defined by the sequence of HPC1 or discrete 
subsets of that sequence, all single-base substitutions of that sequence or discrete subsets of that 
5 sequence, all 1-, 2-, 3-, or 4-base deletions of that sequence or discrete subsets of that sequence, 
and all 1-, 2-, 3-, or 4-base insertions in that sequence or discrete subsets of that sequence. The 
method is useful for identifying mutations for use in either diagnosis of the predisposition to 
cancer or the diagnosis or prognosis of cancer. 

The present invention further provides methods of screening suspected HPC1 mutant 
1 0 alleles to identify mutations in the HPC1 gene. 

In addition, the present invention provides methods to screen drugs for inhibition or 
restoration of HPC 1 gene product function as an anticancer therapy. 

Finally, the present invention provides the means necessary for production of gene-based 
therapies directed at cancer cells. These therapeutic agents may take the form of polynucleotides 
1 5 comprising all or a portion of the HPC1 locus placed in appropriate vectors or delivered to target 
cells in more direct ways such that the function of the HPC1 protein is reconstituted. 
Therapeutic agents may also take the form of polypeptides based on either a portion of, or the 
entire protein sequence of HPC 1 . These may functionally replace the activity of HPC 1 in vivo. 
It is a discovery of the present invention that the HPC1 locus which predisposes 
20 individuals to prostate cancer, is a gene encoding an HPC1 protein, which has been found to 
have no significant homology with publicly available protein or DNA sequences. This gene is 
termed HPC1 herein. It is a discovery of the present invention that mutations in the HPC1 locus 
in the germline are indicative of a predisposition to prostate cancer. It is a discovery of the 
present invention that somatic mutations in the HPC1 locus are also associated with prostate and 
25 other types of cancer. Finally, it is a discovery of the present invention that two common 
polymorphisms of HPC 1 are associated with both prostate and many other types of cancer. The 
mutational events of the HPC1 locus can involve deletions, insertions and point mutations 
within the coding sequence and the non-coding sequence. 
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STRATEGY FOR THE MOLECULAR CLONING OF HPC1 

Starting from a region on chromosome 1 of the human genome, a region which contains 
a genetic locus, HPC1, which causes susceptibility to cancer, including prostate cancer, has been 
identified. 

5 The region containing the HPC1 locus was identified using a variety of genetic 

techniques. Genetic mapping techniques initially defined the HPC1 region in terms of 
recombination with genetic markers. Based upon studies of large extended families ("kindreds") 
with multiple cases of prostate cancer, a chromosomal region has been pinpointed that contains 
the HPC1 gene as well as putative susceptibility alleles in the HPC1 locus. Two meiotic 

10 breakpoints have been discovered on the distal side of the HPC1 locus which are expressed as 
recombinants between genetic markers and the disease, and one recombinant on the proximal 
side of the HPC1 locus. Thus, a region which contains the HPC1 locus is physically bounded by 
these markers. 

Population Resources 

15 Large, well-documented Utah kindreds are especially important in providing good 

resources for human genetic studies. Each large kindred independently gives evidence whether 
or not an HPC1 susceptibility allele is segregating in that family. Recombinants informative for 
localization and isolation of the HPC1 locus could be obtained only from kindreds large enough 
to confirm the presence of a susceptibility allele. Large sibships are especially important for 

20 studying prostate cancer, since penetrance of the HPC1 susceptibility allele is reduced both by 
age and sex, making informative sibships difficult to find. Furthermore, large sibships are 
essential for constructing haplotypes of deceased individuals by inference from the haplotypes of 
their close relatives. 

Genetic Mapping 

25 Given a set of informative families, genetic markers are essential for linking a disease to 

a region of a chromosome. Such markers include restriction fragment length polymorphisms 
(RFLPs) (Botstein et al, 1980), markers with a variable number of tandem repeats (VNTRs) 
(Jeffreys et al % 1985; Nakamura et al y 1987), and an abundant class of DNA polymorphisms 
based on short tandem repeats (STRs), especially repeats of CpA (Weber and May, 1989; Litt et 

30 al, 1989). To generate a genetic map, one selects potential genetic markers and tests them using 
DNA extracted from members of the kindreds being studied. 
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Genetic markers useful in searching for a genetic locus associated with a disease can be 
selected on an ad hoc basis, by densely covering a specific chromosome, or by detailed analysis 
of a specific region of a chromosome. A preferred method for selecting genetic markers linked 
with a disease involves evaluating the degree of informativeness of kindreds to determine the 

5 ideal distance between genetic markers of a given degree of polymorphism, then selecting 
markers from known genetic maps which are ideally spaced for maximal efficiency. 
Informativeness of kindreds is measured by the probability that the markers will be heterozygous 
in unrelated individuals. It is also most efficient to use STR markers which are detected by 
amplification of the target nucleic acid sequence using PCR; such markers are highly 

10 informative, easy to assay (Weber and May, 1989), and can be assayed simultaneously using 
multiplexing strategies (Skolnick and Wallace, 1988), greatly reducing the number of 
experiments required. 

Once linkage has been established, one needs to find markers that flank the disease locus, 
i.e., one or more markers proximal to the disease locus, and one or more markers distal to the 
15 disease locus. Where possible, candidate markers can be selected from a known genetic map. 
Where none is known, new markers can be identified by the STR technique, as shown in the 
Examples. 

Genetic mapping is usually an iterative process. In the present invention, it began by 
defining flanking genetic markers around the HPC1 locus, then replacing these flanking markers 
20 with other markers that were successively closer to the HPC1 locus. As an initial step, 
recombination events, defined by large extended kindreds, helped specifically to localize the 
HPC1 locus as either distal or proximal to regionally localized specific genetic markers. 

Contig assembly 

Given a genetically defined interval flanked by meiotic recombinants, one needs to 
25 generate a contig of genomic clones that spans that interval. Publicly available resources, such 
as the Whitehead integrated maps of the human genome (e.g., the WICGR Chr 1 map of Nov. 
19, 1996) provide aligned chromosome maps of genetic markers, other sequence tagged sites 
(STSs), radiation hybrid map data, and CEPH yeast artificial chromosome (YAC) clones. From 
the map data, one can often identify a set of yeast artificial chromosomes (YACs) that span the 
30 genetically defined interval. Oligonucleotide primer pairs for the markers located in the interval 
can be synthesized and used to screen libraries of bacterial artificial chromosomes (BACs) and 
. PI artificial chromosomes (PACs). Successive rounds of BAC/PAC library screening with BAC 
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or PAC end markers enables the completion of a BAC/PAC clone contig that spans the 
genetically defined interval. A set of overlapping but non-redundant BAC and PAC clones that 
span this interval (Figure 1C) (the tiling path) can then be selected for use in subsequent 
molecular cloning protocols. 
5 Genomic sequencing 

Given a tiling path of BAC and PAC clones across a defined interval, one useful gene 
finding strategy is to generate an almost complete genomic sequence of that interval. Random 
genomic clone sublibraries can be prepared from each BAC or PAC clone in the tiling path. 
Individual sublibrary clones sufficient in number to generate an, on average, 6x redundant 
10 sequence of each BAC or PAC can then be end-sequenced with vector primers. These 
sequences can be assembled into sequence contigs, and these contigs placed in a local genomic 
sequence database. One can search the genomic sequence contigs for sequence similarity with 
known genes and expressed sequence tags (ESTs), examine them for the presence of long open 
translational reading frames, and characterize them for CpG dinucleotide frequency. 
15 Hvhrid selection 

Given a tiling path of BAC and PAC clones across a defined interval, another useful 
gene finding strategy is to obtain cDNA clones cognate to the tiling path BACs and PACs. One 
preferred cDNA cloning strategy is hybrid selection. cDNA can be prepared from a number of 
human tissues and human cell lines in such a manner that the cDNA molecules have PCR primer 
20 binding sites (anchors) at each end. This cDNA can be affinity captured with the tiling path 
BACs and PACs. Captured cDNA can then be PCR amplified using the anchor primers and then 
cloned. Individual clones can then be end-sequenced with vector primers. The sequences of 
these cDNA clones can be analyzed for similarity to genomic sequence contigs generated from 
BACs and PACs on the tiling path. One can then identify individual exons of genes in the 
25 genetically defined interval by parsing the sequences of true-positive hybrid selected clones 
across these genomic sequence contigs. 
RACE and inter-exon PCR 

While hybrid selection is an efficient approach to the initial identification of novel genes 
located within a defined interval of the genome, the approach is not often the most efficient way 
30 to complete the cloning of those genes. Rapid amplification of cDN A ends (RACE) provides a 
PCR based method to identify new 5' and 3' cDNA sequences. cDNA can be prepared from a 
number of human tissues in a manner such that the cDNA molecules have PCR primer binding 
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sites (anchors) ai their 5' ends, 3' ends, or both. PCR amplification from this cDNA with 5' end 
anchor primers and gene specific reverse primers can generate 5' RACE products. Similarly, 
PCR amplification with 3' end anchor primers and gene specific forward primers can generate 3' 
RACE products. cDNA cloning techniques can also miss exons that lie between already known 

5 exons of a gene; for instance, this can easily occur if a particular exon is only included in a 
relatively rare splice variant of a transcript. Combinatorial inter-exon PCR is an effective 
strategy for detecting these exons. One can design a forward primer based on sequences from 
the first known exon of the gene and a set of reverse primers, one based on the sequence of each 
of the downstream exons (or any subset thereof) of the gene. Then one can PCR amplify from 

10 cDNA of tissues and cell lines thought to express the gene, using all the combinations of the 
forward primer with each reverse primer. Combinations as complex as a forward primer from 
each exon paired with a reverse primer from each exon, subject only to the limitation that the 
forward primer should be from an exon upstream of the exon from which the reverse primer was 
designed, can be tried. PCR products which differ in length from the expected product can be 

1 5 gel purified. In either RACE or combinatorial inter-exon PCRs, the PCR products can either be 
gel purified and then sequenced directly or first cloned and then sequenced. 
cDNA library screening 

Another useful strategy for finding new 5', 3', or internal sequences is cDNA library 
screening. One can make or purchase bacteriophage 1 cDNA libraries prepared from RNA from 
20 tissues or cell lines thought to express the gene. One then screens plaque lifts from those 
libraries with labeled nucleic acid probes based on the currently known sequences of the gene of 
interest. Individual positive clones are purified, and then the clone inserts can be sequenced. 
Mutation screening 

Proof that any particular gene located within the genetically defined interval is HPC1 is 
25 obtained by finding sequences in DNA or RNA extracted from affected kindred members which 
create abnormal HPC1 gene products or abnormal levels of HPC1 gene product. Such HPC1 
susceptibility alleles will co-segregate with the disease in large kindreds. They will also be 
present at a much higher frequency in non-kindred individuals with prostate cancer than in 
individuals in the general population. Finally, since tumors often mutate somatically at loci 
30 which are in other instances mutated in the germline, we expect to see normal germline HPC1 
alleles mutated into sequences which are identical or similar to HPC1 susceptibility alleles in 
DNA extracted from tumor tissue. Whether one is comparing HPC1 sequences from tumor 
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tissue to HPC1 alleles from the germline of the same individuals, or one is comparing germline 
HPC1 alleles from cancer cases to those from unaffected individuals, the key is to find mutations 
which are serious enough to cause obvious disruption to the normal function of the gene product. 
These mutations can take a number of forms. The most severe forms would be frame shift 
5 mutations or large deletions which would cause the gene to code for an abnormal protein or one 
which would significantly alter protein expression. Less severe disruptive mutations would 
include small in-frame deletions and nonconservative base pair substitutions which would have a 
significant effect on the protein produced, such as changes to or from a cysteine residue, from a 
basic to an acidic amino acid or vice versa, from a hydrophobic to hydrophilic amino acid or 
10 vice versa, or other mutations which would affect secondary, tertiary or quaternary protein 
structure. Small deletions or base pair substitutions could also significantly alter protein 
expression by changing the level of transcription, splice pattern, mRNA stability, or translation 
efficiency of the HPC1 transcript. Silent mutations or those resulting in conservative amino acid 
substitutions would not generally be expected to disrupt protein function. 
15 Useful Diagnostic Techniques 

According to the diagnostic and prognostic method of the present invention, alteration of 
the wild-type HPC1 locus is detected. In addition, the method can be performed by delecting the 
wild-type HPC1 locus and confirming the lack of a predisposition to cancer at the HPC1 locus. 
"Alteration of a wild-type gene" encompasses all forms of mutations including deletions, inser- 
20 tions and point mutations in the coding and noncoding regions. Deletions may be of the entire 
gene or of only a portion of the gene. Point mutations may result in stop codons, frameshift 
mutations or amino acid substitutions. " Somatic mutations are those which occur only in certain 
tissues, e.g., in the tumor tissue, and are not inherited in the germline. Germline mutations can 
be found in any of a body's tissues and are inherited. If only a single allele is somatically 
25 mutated, an early neoplastic state is indicated. However, if both alleles are somatically mutated, 
then a late neoplastic state is indicated. The finding of HPC1 mutations thus provides both 
diagnostic and prognostic information. An HPC1 allele which is not deleted (e.g., found on the 
sister chromosome to a chromosome carrying an HPC1 deletion) can be screened for other 
mutations, such as insertions, small deletions, and point mutations. It is believed that many 
30 mutations found in tumor tissues will .be those leading to decreased expression of the HPC1 gene 
product. However, mutations leading to non-functional gene products would also lead to a 
cancerous state. Point mutational events may occur in regulatory regions, such as in the 
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promoter of the gene, leading to loss or diminution of expression of the mRNA. Point mutations 
may also abolish proper RNA processing, leading to reduction or loss of expression of the HPC1 
gene product, expression of an altered HPC1 gene product, or to a decrease in mRNA stability 
or translation efficiency. 

5 Useful diagnostic techniques include, but are not limited to fluorescent in situ 

hybridization (FISH), direct DNA sequencing, PFGE analysis, Southern blot analysis, single 
stranded conformation analysis (SSCA), RNase protection assay, allele-specific oligonucleotide 
(ASO), dot blot analysis and PCR-SSCP, as discussed in detail further below. Also useful is the 
recently developed technique of DNA microchip technology. 

1 0 Predisposition to cancers, such as prostate cancer, and the other cancers identified herein, 

can be ascertained by testing any tissue of a human for mutations of the HPC1 gene. For 
example, a person who has inherited a germline HPC1 mutation would be prone to develop 
cancers. This can be determined by testing DNA from any tissue of the person's body. Most 
simply, blood can be drawn and DNA extracted from the cells of the blood. In addition, prenatal 

15 diagnosis can be accomplished by testing fetal cells, placental cells or amniotic cells for 
mutations of the HPC1 gene. Alteration of a wild-type HPC1 allele, whether, for example, by 
point mutation or deletion, can be detected by any of the means discussed herein. 

There are several methods that can be used to detect DNA sequence variation. Direct 
DNA sequencing, either manual sequencing or automated fluorescent sequencing can detect 

20 sequence variation. For a gene as large as HPC 1 , manual sequencing is very labor-intensive, but 
under optimal conditions, mutations in the coding sequence of a gene are rarely missed. Another 
approach is the single-stranded conformation polymorphism assay (SSCA) (Orita et al, 1989). 
This method does not detect all sequence changes, especially if the DNA fragment size is greater 
than 200 bp, but can be optimized to detect most DNA sequence variation. The reduced 

25 detection sensitivity is a disadvantage, but the increased throughput possible with SSCA makes 
it an attractive, viable alternative to direct sequencing for mutation detection on a research basis. 
The fragments which have shifted mobility on SSCA gels are then sequenced to determine the 
exact nature of the DNA sequence variation. Other approaches based on the detection of mis- 
matches between the two complementary DNA strands include clamped denaturing gel 
30 electrophoresis (CDGE) (Sheffield et al, 1991), heteroduplex analysis (HA) (White et al, 1992) 
and chemical mismatch cleavage (CMC) (Grompe et al, 1989). None of the methods described 
above will detect large deletions, duplications or insertions, nor will they detect a regulatory 
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mutation which affects transcription or translation of the protein. Other methods which' might 
detect these classes of mutations such as a protein truncation assay or the asymmetric assay, 
detect only specific types of mutations and would not detect missense mutations. A review of 
currently available methods of detecting DNA sequence variation can be found in a recent 

5 review by Grompe (1993). Once a mutation is known, an allele specific detection approach such 
as allele specific oligonucleotide (ASO) hybridization can be utilized to rapidly screen large 
numbers of other samples for that same mutation. 

In order to detect the alteration of the wild-type HPC1 gene in a tissue, it is helpful to 
isolate the tissue free from surrounding normal tissues. Means for enriching tissue preparation 

10 for tumor cells are known in the art. For example, the tissue may be isolated from paraffin or 
cryostat sections. Cancer cells may also be separated from normal cells by flow cytometry. 
These techniques, as well as other techniques for separating tumor cells from normal cells, are 
well known in the art. If the tumor tissue is highly contaminated with normal cells, detection of 
mutations is more difficult. 

15 Detection of point mutations may be accomplished by molecular cloning of the HPG1 

allele(s) and sequencing the 1 allele(s) using techniques well known in the art. Alternatively, the 
gene sequences can be amplified directly from a genomic DNA preparation from the tumor 
tissue, using known techniques. The DNA sequence of the amplified sequences can then be 
\ determined. 

20 There are six well known methods for a more complete, yet still indirect, test for 

confirming the presence of a susceptibility allele: 1) single-stranded conformation analysis 
(SSCA) (Orita et al, 1989); 2) denaturing gradient gel electrophoresis (DGGE) (Wartell et al, 
1990; Sheffield et al, 1989); 3) RNase protection assays (Finkelstein et al, 1990; Kinszler et 
al, 1991); 4) allele-specific oligonucleotides (ASOs) (Conner et al , 1983); 5) the use of proteins 

25 which recognize nucleotide mismatches, such as the E. coli mutS protein (Modrich, 1991); and 
6) allele-specific PCR (Rano and Kidd, 1989). For allele-specific PCR, primers are used which 
hybridize at their 3' ends to a particular HPC1 mutation. If the particular HPC1 mutation is not 
present, an amplification product is not observed. Amplification Refractory Mutation System 
(ARMS) can also be used, as disclosed in European Patent Application Publication No. 0332435 

30 and in Newton el al, 1989. Insertions and deletions of genes can also be detected by cloning, 
sequencing and amplification. In addition, restriction fragment length polymorphism (RFLP) 
probes for the gene or surrounding marker genes can be used to score alteration of an allele or an 
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insertion in a polymorphic fragment. Such a method is particularly useful for screening relatives 
of an affected individual for the presence of the HPC1 mutation found in that individual. Other 
techniques for detecting insertions and deletions as known in the art can be used. 

In the first three methods (SSCA, DGGE and RNase protection assay), a new 

5 electrophoretic band appears. SSCA detects a band which migrates differentially because the 
sequence change causes a difference in single-strand, intramolecular base pairing. RNase 
protection involves cleavage of the mutant polynucleotide into two or more smaller fragments. 
DGGE detects differences in migration rates of mutant sequences compared to wild-type 
sequences, using a denaturing gradient gel. In an allele-specific oligonucleotide assay, an 

10 oligonucleotide is designed which detects a specific sequence, and the assay is performed by 
detecting the presence or absence of a hybridization signal. In the mutS assay, the protein binds 
only to sequences that contain a nucleotide mismatch in a heteroduplex between mutant and 
wild-type sequences. 

Mismatches, according to the present invention, are hybridized nucleic acid duplexes in 
15 which the two strands are not 100% complementary. Lack of total homology may be due to 
deletions, insertions, inversions or substitutions. Mismatch detection can be used to detect point 
mutations in the gene or in its mRNA product. While these techniques are less sensitive than 
sequencing, they are simpler to perform on a large number of tumor samples. An example of a 
mismatch cleavage technique is the RNase protection method. In the practice of the present 
20 invention, the method involves the use of a labeled riboprobe which is complementary to the 
human wild-type HPC1 gene coding sequence. The riboprobe and either mRNA or DNA 
isolated from the tumor tissue are annealed (hybridized) together and subsequently digested with 
the enzyme RNase A which is able to detect some mismatches in a duplex RNA structure. If a 
mismatch is detected by RNase A, it cleaves at the site of the mismatch. Thus, when the 
25 annealed RNA preparation is separated on an electrophoretic gel matrix, if a mismatch has been 
detected and cleaved by RNase A, an RNA product will be seen which is smaller than the full 
length duplex RNA for the riboprobe and the mRNA or DNA. The riboprobe need not be the 
full length of the HPC1 mRNA or gene but can be a segment of either. If the riboprobe com- 
prises only a segment of the HPC1 mRNA or gene, it will be desirable to use a number of these 
30 probes to screen the whole mRNA sequence for mismatches. 

In similar fashion, DNA probes can be used to detect mismatches, through enzymatic or 
chemical cleavage. See, e.g., Cotton et al, 1988; Shenk et al, 1975; Novack et al, 1986. 
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Alternatively, mismatches can be detected by shifts in the electrophoretic mobility of 
mismatched duplexes relative to matched duplexes. See, e.g., Cariello, 1988. With either 
riboprobes or DNA probes, the cellular mRNA or DNA which might contain a mutation can be 
amplified using PCR (see below) before hybridization. Changes in DNA of the HPC1 gene can 
5 also be detected using Southern hybridization, especially if the changes are gross rearrangements, 

such as deletions and insertions. 

DMA sequences of the HPC1 gene which have.been amplified by use of PCR may also 
be screened using allele-specific probes. These probes are nucleic acid oligomers, each of which 
contains a region of the HPC1 gene sequence harboring a known mutation. For example, one 

10 oligomer may be about 30 nucleotides in length (although shorter and longer oligomers are also 
usable as well recognized by those of skill in the art), corresponding to a portion of the HPC1 
gene sequence. By use of a battery of such allele-specific probes, PCR amplification products 
can be screened to identify the presence of a previously identified mutation in the HPC1 gene. 
Hybridization of allele-specific probes with amplified HPC1 sequences can be performed, for 

15 example, on a nylon filter. Hybridization to a particular probe under high stringency 
hybridization conditions indicates the presence of the same mutation in the tumor tissue as in the 
allele-specific probe. 

The newly developed technique of nucleic acid analysis via microchip technology is also 
applicable to the present invention. In this technique, literally thousands of distinct oligonucleotide 
20 probes are built up in an array on a silicon chip. Nucleic acid to be analyzed is fluorescently 
labeled and hybridized to the probes on the chip. It is also possible to study nucleic acid-protein 
interactions using these nucleic acid microchips. Using this technique one can determine the 
presence of mutations or even sequence the nucleic acid being analyzed or one can measure 
expression levels of a gene of interest. The method is one of parallel processing of many, even 
25 thousands, of probes at once and can tremendously increase the rate of analysis. Several papers 
have been published which use this technique. Some of these are Hacia et al., 1996; Shoemaker et 
al., 1996; Chee et al., 1996; Lockhart et al., 1996; DeRisi et al., 1996; Lipshutz et al., 1995. This 
method has already been used to screen people for mutations in the breast cancer gene BRCAl 
(Hacia et al., 1996). This new technology has been reviewed in a news article in Chemical and 
30 Engineering News (Borman, 1996) and been the subject of an editorial (Nature Genetics, 1996). 
Also see Fodor (1997). 
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The most definitive test for mutations in a candidate locus is to directly compare 
genomic HPC1 sequences from cancer patients with those from a control population. 
Alternatively, one could sequence messenger RNA after amplification, e.g., by PCR, thereby 
eliminating the necessity of determining the exon structure of the candidate gene. 

5 Mutations from cancer patients falling outside the coding region of HPC1 can be 

detected by examining the non-coding regions, such as introns and regulatory sequences near or 
within the HPC1 gene. An early indication that mutations in noncoding regions are important 
may come from Northern blot experiments that reveal messenger RNA molecules of abnormal 
size or abundance in cancer patients as compared to control individuals. 

10 Alteration of HPC1 mRNA expression can be detected by any techniques known in the 

art. These include Northern blot analysis, PCR amplification and RNase protection. Diminished 
mRNA expression indicates an alteration of the wild-type HPC1 gene. Alteration of wild-type 
HPC1 genes can also be detected by screening for alteration of wild-type HPC1 protein. For 
example, monoclonal antibodies immunoreactive with HPC1 can be used to screen a tissue. 

15 Lack of cognate antigen would indicate an HPC1 mutation. Antibodies specific for products of 
mutant alleles could also be used to detect mutant HPC1 gene product. Such immunological 
assays can be done in any convenient formats known in the art. These include Western blots, 
immunohistochemical assays and ELISA assays. Any means for detecting an altered HPC1 
protein can be used to detect alteration of wild-type HPC1 genes. Functional assays, such as 

20 protein binding determinations, can be used. In addition, assays can be used which detect HPC1 
biochemical function. Finding a mutant HPC1 gene product indicates alteration of a wild-type 
HPC1 gene. 

Mutant HPC1 genes or gene products can also be detected in other human body samples, 
such as serum, stool, urine and sputum. The same techniques discussed above for detection of 

25 mutant HPC1 genes or gene products in tissues can be applied to other body samples. Cancer 
cells are sloughed off from tumors and appear in such body samples. In addition, the HPC1 gene 
product itself may be secreted into the extracellular space and found in these body samples even 
in the absence of cancer cells. By screening such body samples, a simple early diagnosis can be 
achieved for many types of cancers. In addition, the progress of chemotherapy or radiotherapy 

30 can be monitored more easily by testing such body samples for mutant HPC1 genes or gene 
products. 
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The methods of diagnosis of the present invention are applicable to any tumor in which 
HPC1 has a role in tumorigenesis. The diagnostic method of the present invention is useful for 
clinicians, so they can decide upon an appropriate course of treatment. 

The primer pairs of the present invention are useful for determination of the nucleotide 
5 sequence of a particular HPC1 allele using PCR. The pairs of single-stranded DNA primers can 
be annealed to sequences within or surrounding the HPC1 gene on chromosome 1 in order to 
prime amplifying DNA synthesis of the HPC1 gene itself. A complete set of these primers 
allows synthesis of all of the nucleotides of the HPC1 gene coding sequences, i.e., the exons. 
The set of primers preferably allows synthesis of both intron and exon sequences. Allele-specific 
10 primers can also be used. Such primers anneal only to particular HPC1 mutant alleles, and thus 
will only amplify a product in the presence of the mutant allele as a template. 

In order to facilitate subsequent cloning of amplified sequences, primers may have 
restriction enzyme site sequences appended to their 5' ends. Thus, all nucleotides of the primers 
are derived from HPC1 sequences or sequences adjacent to HPC1 , except for the few nucleotides 
15 necessary to form a restriction enzyme site. Such enzymes and sites are well known in the art. 
The primers themselves can be synthesized using techniques which are well known in the art. 
Generally, the primers can be made using oligonucleotide synthesizing machines which are 
commercially available. Given the sequence of the HPC1 open reading frame shown in SEQ ID 
NOs:l -52 (see Table 9), design of particular primers is well within the skill of the art. 
20 The nucleic acid probes provided by the present invention are useful for a number of 

purposes. They can be used in Southern hybridization to genomic DNA and in the RNase 
protection method for detecting point mutations already discussed above. The probes can be 
used to detect PCR amplification products. They may also be used to detect mismatches with the 
HPC1 gene or mRNA using other techniques. 
25 It has been discovered that individuals with the wild-type HPC 1 gene do not have cancer 

which results from the HPC1 allele. However, mutations which interfere with the function of 
the HPC1 protein are involved in the pathogenesis of cancer. Thus, the presence of an altered 
(or a mutant) HPC1 gene which produces a protein having a loss of function, or altered function, - 
directly correlates to an increased risk of cancer. In order to detect an HPC1 gene mutation, a 
30 biological sample is prepared and analyzed for a difference between the sequence of the HPC1 
allele being analyzed and the sequence of the wild-type HPC1 allele. Mutant HPC1 alleles can 
be initially identified by any of the techniques described above. The mutant alleles are then 
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sequenced to identify the specific mutation of the particular mutant allele. Alternatively, mutant 
HPC1 alleles can be initially identified by identifying mutant (altered) HPC1 proteins, using 
conventional techniques. The mutant alleles are then sequenced to identify the specific mutation 
for each allele. The mutations, especially those which lead to an altered function of the HPC1 
5 protein, are then used for the diagnostic and prognostic methods of the present invention. 



Definitions 

The present invention employs the following definitions: 

"Amplification of Polynucleotides" utilizes methods such as the polymerase chain 
1 0 reaction (PCR), ligation amplification (or ligase chain reaction, LCR) and amplification methods 
based on the use of Q-beta replicase. Also useful are strand displacement amplification (SDA) 
and nucleic acid sequence based amplification (NASBA). These methods are well known and 
widely practiced in the art. See, e.g., U.S. Patents 4,683,195 and 4,683,202 and Innis et al, 1990 
(for PCR); and Wu et al, 1989a (for LCR); U.S. Patents 5,270,184 and 5,455,166 (for SDA); 
15 Spargo et al., 1996 (for thermophilic SDA) and U.S. Patent 5,409,818, Fahy et al., 1991 and 
Compton, 1991 for 3SR and NASBA. Reagents and hardware for conducting PCR are 
commercially available. Primers useful to amplify sequences from the HPC1 region are 
preferably complementary to, and hybridize specifically to sequences in the HPC1 region or in 
regions that flank a target region therein. HPC1 sequences generated by amplification may be 
20 sequenced directly. Alternatively, but less desirably, the amplified sequence(s) may be cloned 
prior to sequence analysis. A method for the direct cloning and sequence analysis of 
enzymatically amplified genomic segments has been described by Scharf, 1986. 

"Analyte polynucleotide" and "analyte strand" refer to. a single- or double-stranded 
polynucleotide which is suspected of containing a target sequence, and which may be present in 
25 a variety of types of samples, including biological samples. 

"Antibodies." The present invention also provides polyclonal and/or monoclonal 
antibodies and fragments thereof, and immunologic binding equivalents thereof, which are 
capable of specifically binding to the HPC1 polypeptides and fragments thereof or to 
polynucleotide sequences from the HPC1 region, particularly from the HPC1 locus or a portion 
30 thereof. The term "antibody" is used both to refer to a homogeneous molecular entity, or a 
mixture such as a serum product made up of a plurality of different molecular entities. 
Polypeptides may be prepared synthetically in a peptide synthesizer and coupled to a carrier 
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molecule (e.g., keyhole limpet hemocyanin) arid injected over several months into rabbits. 
Rabbit sera is tested for immunoreactivity to the HPC1 polypeptide or fragment. Monoclonal 
antibodies may be made by injecting mice with the protein polypeptides, fusion proteins or 
fragments thereof. Monoclonal antibodies will be screened by ELISA and tested for specific 

5 immunoreactivity with HPC1 polypeptide or fragments thereof. See, Harlow and Lane, 1988. 
These antibodies will be useful in assays as well as pharmaceuticals. 

Once a sufficient quantity of desired polypeptide has been obtained, it may be used for 
various purposes. A typical use is the production of antibodies specific for binding. These 
antibodies may be either polyclonal or monoclonal, and may be produced by in vitro or in vivo 

10 techniques well known in the art. For production of polyclonal antibodies, an appropriate target 
immune system, typically mouse or rabbit, is selected. Substantially purified antigen is presented 
to the immune system in a fashion determined by methods appropriate for the animal and by 
other parameters well known to immunologists. Typical sites for injection are in footpads, 
intramuscularly, intraperitoneally, or intradermally. Of course, other species may be substituted 

15 for mouse or rabbit. Polyclonal antibodies are then purified using techniques known in the art, 
adjusted for the desired specificity. 

An immunological response is usually assayed with an immunoassay. Normally, such 
immunoassays involve some purification of a source of antigen, for example, that produced by 
the same cells and in the same fashion as the antigen. A variety of immunoassay methods are 

20 well known in the art. See, e.g., Harlow and Lane, 1988, or Coding, 1986. 

Monoclonal antibodies with affinities of 10" 8 M" 1 or preferably 10" 9 to 10" 10 M" 1 or 
stronger will typically be made by standard procedures as described, e.g., in Harlow and Lane, 
1988 or Goding, 1986. Briefly, appropriate animals will be selected and the desired immunization 
protocol followed. After the appropriate period of time, the spleens of such animals are excised 

25 and individual spleen cells fused, typically, to immortalized myeloma cells under appropriate 
selection conditions. Thereafter, the cells are clonally separated and the supernatants of each 
clone tested for their production of an appropriate antibody specific for the desired region of the 
antigen. 

Other suitable techniques involve in vitro exposure of lymphocytes to the antigenic 
30 polypeptides, or alternatively, to selection of libraries of antibodies in phage or similar vectors. 
See Huse et al 9 1989. The polypeptides and antibodies of the present invention may be used 
with or without modification. Frequently, polypeptides and antibodies will be labeled by joining, 
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either covalently or non-covalently, a substance which provides for a detectable signal. A wide 
variety of labels and conjugation techniques are known and are reported extensively in both the 
scientific and patent literature. Suitable labels include radionuclides, enzymes, substrates, 
cofactors, inhibitors, fluorescent agents, chemiluminescent agents, magnetic particles and the 

5 like. Patents teaching the use of such labels include U.S. Patents 3,817,837; 3,850,752; 
3,939,350; 3,996,345; 4,277,437; 4,275,149 and 4,366,241. Also, recombinant immunoglobulins 
may be produced (see U.S. Patent 4,816,567). 

"Binding partner 0 refers to a molecule capable of binding a ligand molecule with high 
specificity, as for example, an antigen and an antigen-specific antibody or an enzyme and its 

1 0 inhibitor. In general, the specific binding partners must bind with sufficient affinity to immobilize 
the analyte copy/complementary strand duplex (in the case of polynucleotide hybridization) 
under the isolation conditions. Specific binding partners are known in the art and include, for 
example, biotin and avidin or streptavidin, IgG and protein A, the numerous, known receptor- 
ligand couples, and complementary polynucleotide strands. In the case of complementary 

15 polynucleotide binding partners, the partners are normally at least about 15 bases in length, and 
may be at least 40 bases in length. It is well recognized by those of skill in the art that lengths 
shorter than 15, between 15 and 40, and greater than 40 bases may also be used. The 
polynucleotides may be composed of DNA, RNA, or synthetic nucleotide analogs. 

A "biological sample" refers to a sample of tissue or fluid suspected of containing an 

20 analyte polynucleotide or polypeptide from an individual including, but not limited to, e.g., 
plasma, serum, spinal fluid, lymph fluid, the external sections of the skin, respiratory, intestinal, 
and genitourinary tracts, tears, saliva, blood cells, tumors, organs, tissue and samples of in vitro 
cell culture constituents. 

As used herein, the terms "diagnosing" or "prognosing," as used in the context of 

25 neoplasia, are used to indicate 1) the classification of lesions as neoplasia, 2) the determination 
of the severity of the neoplasia, or 3) the monitoring of the disease progression, prior to, during 
and after treatment. 

"Encode". A polynucleotide is said to "encode" a polypeptide if, in its native state or 
when manipulated by methods well known to those skilled in the art, it can be transcribed and/or 
30 translated to produce the mRNA for and/or the polypeptide or a fragment thereof. The anti- 
sense strand is the complement of such a nucleic acid, and the encoding sequence can be 
deduced therefrom. 
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"Isolated" or "substantially pure". An "isolated" or "substantially pure" nucleic acid 
(e.g., an RNA, DMA or a mixed polymer) is one which is substantially separated from other 
cellular components which naturally accompany a native human sequence or protein, e.g., 
ribosomes, polymerases, many other human genome sequences and proteins. The term 

5 embraces a nucleic acid sequence or protein which has been removed from its naturally 
occurring" environment, and includes recombinant or cloned DNA isolates and chemically 
synthesized analogs or analogs biologically synthesized by heterologous systems. 

"HPC1 Allele" refers to normal alleles of the HPC1 locus as well as alleles carrying 
variations that predispose individuals to develop cancer of many sites including, for example, 

10 breast, ovarian, colorectal and prostate cancer. Such predisposing alleles are also called "HPC1 

susceptibility alleles". 

"HPC1 Locus", "HPC1 Gene", "HPC1 Nucleic Acids" or "HPC1 Polynucleotide" 

each refer to polynucleotides, all of which are in the HPC1 region, that are likely to be expressed 
in normal tissue, certain alleles of which predispose an individual to develop breast, ovarian, 
15 colorectal and prostate cancers. Mutations at the HPC1 locus may be involved in the initiation 
and/or progression of other types of tumors. The locus is indicated in part by mutations that 
predispose individuals to develop cancer. These mutations fall within the HPC1 region 
described infra. The HPC1 locus is intended to include coding sequences, intervening sequences • 
and regulatory elements controlling transcription and/or translation. The HPC1 locus is intended 
20 to include all allelic variations of the DNA sequence. 

These terms, when applied to a nucleic acid, refer to a nucleic acid which encodes an 
HPC1 polypeptide, fragment, homolog or variant, including, e.g., protein fusions or deletions. 
The nucleic acids of the present invention will possess a sequence which is either derived from, 
or substantially similar to a natural HPCl-encoding gene or one having substantial homology 
25 with a natural HPC 1 -encoding gene or a portion thereof. 

The polynucleotide compositions of this invention include RNA, cDNA, genomic DNA, 
synthetic forms, and mixed polymers, both sense and antisense strands, and may be chemically 
or biochemically modified or may contain non-natural or derivatized nucleotide bases, as will be 
readily appreciated by those skilled in the art. Such modifications include, for example, labels, 
30 methylation, substitution of one or more of the naturally occurring nucleotides with an analog, 
intemucleotide modifications such as uncharged linkages (e.g., methyl phosphonates, 
phosphotriesters, phosphoramidates, carbamates, etc.), charged linkages (e.g., phosphorothioates, 
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phosphorodithioates, etc.), pendent moieties (e.g., polypeptides), intercalators (e.g., acridine, 
psoralen, etc.), chelators, alkylators, and modified linkages (e.g., alpha anomeric nucleic acids, 
etc.). Also included are synthetic molecules that mimic polynucleotides in their ability to bind 
to a designated sequence via hydrogen bonding and other chemical interactions. Such molecules 

5 are known in the art and include, for example, those in which peptide linkages substitute for 
phosphate linkages in the backbone of the molecule. 

The present invention provides recombinant nucleic acids comprising all or part of the 
HPC1 region. The recombinant construct may be capable of replicating autonomously in a host 
cell. Alternatively, the recombinant construct may become integrated into the chromosomal 

10 DNA of the host cell. Such a recombinant polynucleotide comprises a polynucleotide of 
genomic, cDNA, semi-synthetic, or synthetic origin which, by virtue of its origin or 
manipulation, 1) is not associated with all or a portion of a polynucleotide with which it is 
associated in nature; 2) is linked to a polynucleotide other than that to which it is linked in 
nature; or 3) does not occur in nature. 

15 Therefore, recombinant nucleic acids comprising sequences otherwise not naturally 

occurring are provided by this invention. Although the wild-type sequence may be employed, it 
will often be altered, e.g., by deletion, substitution or insertion. 

cDNA or genomic libraries of various types may be screened as natural sources of the 
nucleic acids of the present invention, or such nucleic acids may be provided by amplification of 

20 sequences resident in genomic DNA or other natural sources, e.g., by PCR. The choice of 
cDNA libraries normally corresponds to a tissue source which is abundant in mRNA for the 
desired proteins. Phage libraries are normally preferred, but other types of libraries may be used. 
Clones of a library are spread onto plates, transferred to a substrate for screening, denatured and 
probed for the presence of desired sequences. 

25 The DNA sequences used in this invention will usually comprise at least about five 

codons (15 nucleotides), more usually at least about 7-15 codons, and most preferably, at least 
about 35 codons. One or more introns may also be present. This number of nucleotides is usually 
about the minimal length required for a successful probe that would hybridize specifically with 
an HPC 1 -encoding sequence. 

30 Techniques for nucleic acid manipulation are described generally, for example, in 

Sambrook et a/., 1989 or Ausubel et al. % 1992. Reagents useful in applying such techniques, 
such as restriction enzymes and the like, are widely known in the art and commercially available 
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from such vendors as New England BioLabs, Boehringer Mannheim, Amersham, Promega 
Biotec, U. S. Biochemicals, New England Nuclear, and a number of other sources. The 
recombinant nucleic acid sequences used to produce fusion proteins of the present invention may 
be derived from natural or synthetic sequences. Many natural gene sequences are obtainable 
5 from various cDNA or from genomic libraries using appropriate probes. See, GenBank, 
National Institutes of Health. 

"HPCl Region" refers to a portion of human chromosome 1 bounded by the markers 
mM.GAAA.158;23.4 and mM.GA57el5.S16. This region contains the HPCl locus, including 
the HPCl gene. 

10 As used herein, the terms "HPCl locus", "HPCl allele" and "HPCl region" all refer 

to the double-stranded DNA comprising the locus, allele, or region, as well as either of the 
single-stranded DNAs comprising the locus, allele or region. 

As used herein, a "portion" of.the HPCl locus or region or allele is defined as having a 
minimal size of at least about eight nucleotides, or preferably about 15 nucleotides, or more 

15 preferably at least about 25 nucleotides, and may have a minimal size of at least about 40 
nucleotides. This definition includes all sizes in the range of 8-40 nucleotides as well as greater 
than 40 nucleotides. Thus, this definition includes nucleic acids of 8, 12, 15, 20, 25, 40, 60, 80, 
1 00, 200, 300, 400, 500 nucleotides, or nucleic acids having any number of nucleotides within 
these ranges of values (e.g., 9, 10, 11, 16, 23, 30, 38, 50, 72, 121, etc., nucleotides), or nucleic 

20 acids having more than 500 nucleotides. The present invention includes all novel nucleic acids 
having at least 8 nucleotides derived from any of SEQ ID NOs:l-52 and any combination of 
these sequences as described in further detail below, its complement or functionally equivalent 
nucleic acid sequences. The present invention does not include nucleic acids which exist in the 
prior art. That is, the present invention includes all nucleic acids having at least 8 nucleotides 

25 derived from any of SEQ ID NOs: 1 -52 and any combination of these sequences as described in 
further detail below with the proviso that it does not include nucleic acids existing in the prior 
art. 

"HPCl protein" or "HPCl polypeptide" refers to a protein or polypeptide encoded by 
the HPCl locus, variants or fragments thereof. The term "polypeptide" refers to a polymer of 
30 amino acids and its equivalent and does not refer to a specific length of the product; thus, 
peptides, oligopeptides and proteins are included within the definition of a polypeptide. This 
term also does not refer to, or exclude modifications of the polypeptide, for example, 
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glycosylations, acetylations, phosphorylations, and the like. Included within the definition are, 
for example, polypeptides containing one or more analogs of an amino acid (including, for 
example, unnatural amino acids, etc.), polypeptides with substituted linkages as well as other 
modifications known in the art, both naturally and non-naturally occurring. Ordinarily, such 
5 polypeptides will be at least about 50% homologous to the native HPC1 sequence, preferably in 
excess of about 90%, and more preferably at least about 95% homologous. Also included are 
proteins encoded by DNA which hybridize under high or low stringency conditions, to HPC1- 
encoding nucleic acids and closely related polypeptides or proteins retrieved by antisera to the 
HPC1 protein(s). 

] 0 An HPC1 polypeptide may be that derived from any of the exons described herein which 

may be in isolated and/or purified form, free or substantially free of material with which it is 
naturally associated. The polypeptide may, if produced by expression in a prokaryotic cell or 
produced synthetically, lack native post-translationai processing, such as glycosylation. 
Alternatively, the present invention is also directed to polypeptides which are sequence variants, 
15 alleles or derivatives of an HPC1 polypeptide. Such polypeptides may have an amino acid 
sequence which differs from that derived form any of the exons described herein by one or more 
of addition, substitution, deletion or insertion of one or more amino acids. Preferred such 
polypeptides have HPC1 function. 

Substitutional variants typically contain the exchange of one amino acid for another at 
20 one or more sites within the protein, and may be designed to modulate one or more properties of 
the polypeptide, such as stability against proteolytic cleavage, without the loss of other functions 
or. properties. Amino acid substitutions may be made on the basis of similarity in polarity, 
charge, solubility, hydrophobicity, hydrophilicity, and/or the amphipathic nature of the residues 
involved. Preferred substitutions are ones which are conservative, that is, one amino acid is 
25 replaced with one of similar shape and charge. Conservative substitutions are well known in the 
art and typically include substitutions within the following groups: glycine, alanine; valine, 
isoleucine, leucine; aspartic acid, glutamic acid; asparagine, glutamine; serine, threonine; lysine, 
arginine; and tyrosine, phenylalanine. 

Certain amino acids may be substituted for other amino acids in a protein structure 
30 without appreciable loss of interactive binding capacity with structures such as, for example, 
antigen-binding regions of antibodies or binding sites on substrate molecules or binding sites on 
proteins interacting with an HPC1 polypeptide. Since it is the interactive capacity and nature of 
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a protein which defines that protein's biological functional activity, certain amino acid 
substitutions can be made in a protein sequence, and its underlying DNA coding sequence, and 
nevertheless obtain a protein with like properties. In making such changes, the hydropathic 
index of amino acids may be considered. The importance of the hydrophobic amino acid index 
5 in conferring interactive biological function on a protein is generally understood in the art (Kyte 
and Doolittle, 1982). Alternatively, the substitution of like amino acids can be made effectively 
on the basis of hydrophilicity. The importance of hydrophilicity in conferring interactive 
biological function of a protein is generally understood in the art (U.S. Patent 4,554,101). The 
use of the hydrophobic index or hydrophilicity in designing polypeptides is further discussed in 

10 U.S. Patent 5,691,198. 

The length of polypeptide sequences compared for homology will generally be at least 
about 16 amino acids, usually at least about 20 residues, more usually at least about 24 residues, 
typically at least about 28 residues, and preferably more than about 35 residues. 

"Operably linked M refers to a juxtaposition wherein the components so described are in 

15 a relationship permitting them to function in their intended manner. For instance, a promoter is 
operably linked to a coding sequence if the promoter affects its transcription or expression.. 

The term peptide mimetic or mimetic is intended to refer to a substance which has the 
essential biological activity of an HPC1 polypeptide. A peptide mimetic may be a peptide- 
containing molecule that mimics elements of protein secondary structure (Johnson et al., 1993). 

20 The underlying rationale behind the use of peptide mimetics is that the peptide backbone of 
proteins exists chiefly to orient amino acid side chains in such a way as to facilitate molecular 
interactions, such as those of antibody and antigen, enzyme and substrate or scaffolding proteins. 
A peptide mimetic is designed to permit molecular interactions similar to the natural molecule. 
A mimetic may not be a peptide at all, but it will retain the essential biological activity of a 

25 natural HPC1 polypeptide. 

"Probes". Polynucleotide polymorphisms associated with HPC1 alleles which 
predispose to certain cancers or are associated with most cancers are detected by hybridization 
with a polynucleotide probe which forms a stable hybrid with that of the target sequence, under 
highly stringent to moderately stringent hybridization and wash conditions. If it is expected that 

30 the probes will be perfectly complementary to the target sequence, high stringency conditions 
will be used. Hybridization stringency may be lessened if some mismatching is expected, for 
example, if variants are expected with the result that the probe will not be completely 
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complementary. Conditions are chosen which rule out nonspecific/adventitious bindings, that is, 
which minimize noise. (It should be noted that throughout this disclosure, if it is simply stated 
that "stringent" conditions are used that is meant to be read as "high stringency" conditions are 
used.) Since such indications identify neutral DNA polymorphisms as well as mutations, these 
5 indications need further analysis to demonstrate detection of an HPC1 susceptibility allele. 

Probes for HPC1 alleles may be derived from the sequences of the HPC1 region or its 
cDNAs. The probes may be of any suitable length, which span all or a portion of the HPC1 
region, and which allow specific hybridization to the HPC1 region. If the target sequence 
contains a sequence identical to that of the probe, the probes maybe short, e.g., in the range of 
10 about 8-30 base pairs, since the hybrid will be relatively stable under even highly stringent 
conditions. If some degree of mismatch is expected with the probe, i.e., if it is suspected that the 
probe will hybridize to a variant region, a longer probe may be employed which hybridizes, to 
the target sequence with the requisite specificity. 

The probes will include an isolated polynucleotide attached to a label or reporter 
1 5 molecule and may be used to isolate other polynucleotide sequences, having sequence similarity 
by standard methods. For techniques for preparing and labeling probes see, e.g., Sambrook et 
al, 1989 or Ausubel et ah, 1992. Other similar polynucleotides may be selected by using 
homologous polynucleotides. Alternatively, polynucleotides encoding these or similar 
polypeptides may be synthesized or selected by use of the redundancy in the genetic code. 
20 Various codon substitutions may be introduced, e.g., by silent changes (thereby producing 
various restriction sites) or to optimize expression for a particular system. Mutations may be 
introduced to modify the properties of the polypeptide, perhaps to change ligand-binding 
affinities, interchain affinities, or the polypeptide degradation or turnover rate. 

Probes comprising synthetic oligonucleotides or other polynucleotides of the present 
25 invention may be derived from naturally occurring or recombinant single- or double-stranded 
polynucleotides, or be chemically synthesized. Probes may also be labeled by nick translation, 
Klenow fill-in reaction, or other methods known in the art. 

Portions of the polynucleotide sequence having at least about eight nucleotides, usually 
at least about 15 nucleotides, and fewer than about 6 kb, usually fewer than about 1.0 kb, from a 
30 polynucleotide sequence encoding HPC1 are preferred as probes. Thus, this definition includes 
probes of 8, 12, 15, 20, 25, 40, 60, 80, 100, 200, 300, 400 or 500 nucleotides or probes having 
any number of nucleotides within these ranges of values (e.g.. 9, 10, 1 1, 16, 23, 30, 38, 50, 72, 
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121, etc., nucleotides), or probes having more than 500 nucleotides. The probes may also be 
used to determine whether mRNA encoding HPC1 is present in a cell or tissue. The present 
invention includes all novel probes having at least 8 nucleotides derived from any of SEQ ID 
NOs:l-42 and any combination of these sequences as described in further detail below, its 

5 complement or functionally equivalent nucleic acid sequences. The present invention does not 
include probes which exist in the prior art. That is, the present invention includes all probes 
having at least 8 nucleotides derived from any of SEQ ID NOs:l-52 and any combination of 
these sequences as described in further detail below with the proviso that they do not include 
probes existing in the prior art. 

10 Similar considerations and nucleotide lengths are also applicable to primers which may 

be used for the amplification of all or part of the HPC1 gene. Thus, a definition for primers 
includes primers of 8, 12, 15, 20, 25, 40, 60, 80, 100, 200, 300, 400, 500 nucleotides, or primers 
having any number of nucleotides within these ranges of values (e.g., 9, 10, 1 1, 16, 23, 30, 38, 
50, 72, 121, etc. nucleotides), or primers having more than 500 nucleotides, or any number of 

1 5 nucleotides between 500 and 9000. The primers may also be used to determine whether mRNA 
encoding HPC1 is present in a cell or tissue. The present invention includes all novel primers 
having at least 8 nucleotides derived from the HPC1 locus for amplifying the HPC1 gene, its 
complement or functionally equivalent nucleic acid sequences. The present invention does not 
include primers which exist in the prior art. That is, the present invention includes all primers 

20 having at least 8 nucleotides with the proviso that it does not include primers existing in the 
prior art. 

"Protein modifications or fragments" are provided by the present invention for HPC1 
polypeptides or fragments thereof which are substantially homologous to primary structural 
sequence but which include, e.g., in vivo or in vitro chemical and biochemical modifications or 

25 which incorporate unusual amino acids. Such modifications include, for example, acetylation, 
carboxylation, phosphorylation, glycosylation, ubiquitination. labeling, e.g., with radionuclides, 
and various enzymatic modifications, as will be readily appreciated by those well skilled in the 
art. A variety of methods for labeling polypeptides and of substituents or labels useful for such 
purposes are well known in the art, and include radioactive isotopes such as 32 P, ligands which 

30 bind to labeled antiligands (e.g., antibodies), fluorophores, chemiluminescent agents, enzymes, 
and antiligands which can serve as specific binding pair members for a labeled ligand. The 
choice of label depends on the sensitivity required, ease of conjugation with the primer, stability 
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requirements, and available instrumentation. Methods of labeling polypeptides are well known 
in the art. See Sambrook el al, 1989 or Ausubel et al, 1992. 

Besides substantially full-length polypeptides, the present invention provides for 
biologically active fragments of the polypeptides. Significant biological activities include 
5 ligand-binding, immunological activity and other. biological activities characteristic of HPC1 
polypeptides. Immunological activities include both immunogenic function in a target immune 
system, as well as sharing of immunological epitopes for binding, serving as either a competitor 
or substitute antigen for an epitope of the HPC1 protein. As used herein, "epitope" refers to an 
antigenic determinant of a polypeptide. An epitope could comprise three amino acids in a 
10 spatial conformation which is unique to the epitope. Generally, an epitope consists of at least 
five such amino acids, and more usually consists of at least 8-10 such amino acids. Methods of 
determining the spatial conformation of such amino acids are known in the art. 

For immunological purposes, tandem-repeat polypeptide segments may be used as 
immunogens, thereby producing highly antigenic proteins. Alternatively, such polypeptides will 
15 serve as highly efficient competitors for specific binding. Production of antibodies specific for 
HPC1 polypeptides or fragments thereof is described below. 

The present invention also provides for fusion polypeptides, comprising HPC1 
polypeptides and fragments. Homologous polypeptides may be fusions between two or more 
HPC1 polypeptide sequences or between the sequences of HPC1 and a related protein. 
20 Likewise, heterologous fusions may be constructed which would exhibit a combination of 
properties or activities of the derivative proteins. For example, ligand-binding or other domains 
may be "swapped" between different new fusion polypeptides or fragments. Such homologous 
or heterologous fusion polypeptides may display, for example, altered strength or specificity X>f 
binding. Fusion partners include immunoglobulins, bacterial b-galactosidase, trpE, protein A, b- 
25 lactamase, alpha amylase, alcohol dehydrogenase and yeast alpha mating factor. See Godowski 
etaU 1988. 

Fusion proteins will typically be made by either recombinant nucleic acid methods, as 
described below, or may be chemically synthesized. Techniques for the synthesis of 
polypeptides are described, for example, in Merrifield, 1963. 
30 "Protein purification" refers to various methods for the isolation of the HPC1 

polypeptides from other biological material, such as from cells transformed with recombinant 
nucleic acids encoding HPC1, and are well known in the art. For example, such polypeptides 
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may be purified by immunoaffmity chromatography employing, e.g., the antibodies provided by 
the present invention. Various methods of protein purification are well known in the art, and 
include those described in Deutscher, 1990 and Scopes, 1982. 

The terms "isolated", "substantially pure", and "substantially homogeneous" are used 

5 interchangeably to describe a protein or polypeptide which has been separated from components 
which accompany it in its natural state. A monomeric protein is substantially pure when at least 
about 60 to 75% of a sample exhibits a single polypeptide sequence. A substantially pure 
protein will typically comprise about 60 to 90% WAV of a protein sample, more usually about 
95%, and preferably will be over about 99% pure. Protein purity or homogeneity may be 

10 indicated by a number of means well known in the art, such as polyacrylamide gel 
electrophoresis of a protein sample, followed by visualizing a single polypeptide band upon 
staining the gel. For certain purposes, higher resolution may be provided by using HPLC or 
other means well known in the art which are utilized for purification. 

An HPC1 protein is substantially free of naturally associated components when it is 

15 separated from the native contaminants which accompany it in its natural state. Thus, a 
polypeptide which is chemically synthesized or synthesized in a cellular system different from 
the cell from which it naturally originates will be substantially free from its naturally associated 
components. A protein may also be rendered substantially free of naturally associated 
components by isolation, using protein purification techniques well known in the art. 

20 A polypeptide produced as an expression product of an isolated and manipulated genetic 

sequence is an "isolated polypeptide," as used herein, even if expressed in a homologous cell 
type. Synthetically made forms or molecules expressed by heterologous cells are inherently 
isolated molecules. 

"Recombinant nucleic acid" is a nucleic acid which is not naturally occurring, or which 
25 is made by the artificial combination of two otherwise separated segments of sequence. This 
artificial combination is often accomplished by either chemical synthesis means, or by the 
artificial manipulation of isolated segments of nucleic acids, e.g.. by genetic engineering 
techniques. Such is usually done to replace a codon with a redundant codon encoding the same 
or a conservative amino acid, while typically introducing or removing a sequence recognition 
30 site. Alternatively, it is performed to join together nucleic acid segments of desired functions to 
generate a desired combination of functions. 
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"Regulator)' sequences" refers to those sequences normally within 100 kb of the coding 
region of a locus, but they may also be more distant from the coding region, which affect the 
expression of the gene (including transcription of the gene, and translation, splicing, stability or 
the like of the messenger RNA). 
5 "Substantial homology or similarity". A nucleic acid or fragment thereof is 

"substantially homologous" ("or substantially similar") to another if, when optimally aligned 
(with appropriate nucleotide insertions or deletions) with the other nucleic acid (or its 
complementary strand), there is nucleotide sequence identity in at least about 60% of the 
nucleotide bases, usually at least about 70%, more usually at least about 80%, preferably at least 
1 0 about 90%, and more preferably at least about 95-98% of the nucleotide bases. 

To determine homology between two different nucleic acids, the percent homology is to 
be determined using the BLASTN program "BLAST 2 sequences". This program is available . 
for public use from the National Center for Biotechnology Information (NCBI) over the Internet 
(http://www.ncbi.nlm.nih.gov/gorf/bl2.html) (Altschul et al., 1997). The parameters to be used 
15 are whatever combination of the following yields the highest calculated percent homology (as 
calculated below) with the default parameters shown in parentheses: 
Program - blastn 
Matrix - 0 BLOSUM62 
Reward for a match - 0 or 1 (1) 
20 Penalty for a mismatch - 0, -1 , -2 or -3 (-2) 

Open gap penalty - 0, 1, 2, 3, 4 or 5 (5) 
Extension gap penalty - 0 or 1 (1) 
Gapx_dropofT-0or50(50) 
Expect- 10 

25 Along with a variety of other results, this program shows a percent identity across the 

complete strands or across regions of the two nucleic acids being matched. The program shows 
as part of the results an alignment and identity of the two strands being compared. If the strands 
are of equal length then the identity will be calculated across the complete length of the nucleic 
acids. If the strands are of unequal lengths, then the length of the shorter nucleic acid is to be 

30 used. If the nucleic acids are quite similar across a portion of their sequences but different 
across the rest of their sequences, the blastn program "BLAST 2 Sequences" will show an 
identity across only the similar portions, and these portions are reported individually. For 
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purposes of determining homology herein, the percent homology refers to the shorter of the two 
sequences being compared. If any one region is shown in different alignments with differing 
percent identities, the alignments which yield the greatest homology are to be used. The 
averaging is to be performed as in this example of SEQ ID NOs:53 and 54. 
5 5 f -ACCGTAGCTACGTACGTATATAGAAAGGGCGCGATCGTCGTCGCGTATG 

ACGACTTAGCATGC-3 1 (SEQ ID NO:53) 

5'-ACCGGTAGCTACGTACGTTATTTAGAAAGGGGTGTGTGTGTGTGTGTAA 
ACCGGGGTTTTCGGGATCGTCCGTCGCGTATGACGACTTAGCCATGCACGGTATAT 
CGTATTAGGACTAGCGATTGACTAG-3' (SEQ ID NO:54) 

10 The program "BLAST 2 Sequences" shows differing alignments of these two nucleic 

acids depending upon the parameters which are selected. As examples, four sets of parameters 
were selected for comparing SEQ ID NOs:53 and 54 (gap xjdropoff was 50 for all cases), with 
the results shown in Table A. It is to be noted that none of the sets of parameters selected as 
shown in Table A. is necessarily the best set of parameters for comparing these sequences. The 

15 percent homology is calculated by multiplying for each region showing identity the fraction of 
bases of the shorter strand within a region times the percent identity for that region and adding 
all of these together. For example, using the first set of parameters shown in Table A, SEQ ID 
NO:53 is the short sequence (63 bases), and two regions of identity are shown, the first 
encompassing bases 4-29 (26 bases) of SEQ ID NO:53 with 92% identity to SEQ ID NO:54 and 

20 the second encompassing bases 39-59 (21 bases) of SEQ ID NO:53 with 100% identity to SEQ 
ID NO:54. Bases 1-3, 30-38 and 60-63 (16 bases) are not shown as having any identity with 
SEQ ID NO:54. Percent homology is calculated as: (26/63)(92) + (21/63)(100) + (16/63)(0) = 
71.3% homology. The percents of homology calculated using each of the four sets of 
parameters shown are listed in Table A. Several other combinations of parameters are possible, 

25 but they are not listed for the sake of brevity. It is seen that each set of parameters resulted in a 
different calculated percent homology. Because the result yielding the highest percent 
homology is to be used, based solely on these four sets of parameters one would state that SEQ 
ID NOs:53 and 54 have 87.1% homology. Again it is to be noted that use of other parameters 
may show an even higher homology for SEQ ID NOs:53 and 54, but for brevity not all the 

30 possible results are shown. 
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TABLE A 



Parameter Values 








Match 


Mismatch 


Open 
Gap ! 


Extension 
Gap 


Regions of identity (%) 


Homology 


1 


-2 


5 


1 


4-29 of 5 and 
5-31 of 6 (92%) 


39-59 of 5 and 
71-91 of 6 
(100%) 


71.3 


1 


-2 


2 


1 


4-29 of 5 and 
5-31 of 6 (92%) 


33-63 of 5 and 
64-96 of 6 
(93%) 


83.7 


1 


-1 


5 


1 




30-59 of 5 and 
61-91 of 6 
(93%) 


44.3 


1 


-1 


2 


1 


4-29 of 5 and 
5-31 of 6 
(92%) 


30-63 of 5 and 
61-96 of 6 
(91%) 


87.1 



Identity means the degree of sequence relatedness between two polypeptide or two 
polynucleotides sequences as determined by the identity of the match between two strings of 

5 such sequences. Identity can be readily calculated. While there exist a number of methods to 
measure identity between two polynucleotide or polypeptide sequences, the term "identity" is 
well known to skilled artisans (Computational Molecular Biology, v Lesk, A. M., ed., Oxford 
University Press, New York, 1988; Biocomputing: Informatics and Genome Projects, Smith, D. 
W., ed., Academic Press, New York, 1 993; Computer Analysis of Sequence Data, Part I, Griffin, 

10 A. M., and Griffin, H. G., eds., Humana Press, New Jersey, 1994; Sequence Analysis in 
Molecular Biology, von Heinje, G., Academic Press, 1987; and Sequence Analysis Primer, 
Gribskov, M. and Devereux, J., eds., M Stockton Press, New York, 1991). Methods commonly 
employed to determine identity between two sequences include, but are not limited to those 
disclosed in Guide to Huee Computers , Martin J. Bishop, ed., Academic Press, San Diego, 1994, 
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and Carillo, H., and Lipman, D. (1988). Preferred methods to determine identity are designed to 
give the largest match between the two sequences tested. Such methods are codified in computer 
programs. Preferred computer program methods to determine identity between two sequences 
include, but are not limited to, GCG program package (Deyereux et al. (1984), BLASTP, 

5 BLASTN, FASTA (Altschul et al. (1990)). 

Alternatively, substantial homology or similarity exists when a nucleic acid or fragment 
thereof will hybridize to another nucleic acid (or a complementary strand thereof) under 
selective hybridization conditions, to a strand, or to its complement. Selectivity of hybridization 
exists when hybridization which is substantially more selective than total lack of specificity 

1 0 occurs. Typically, selective hybridization will occur when there is at least about 55% homology 
over a stretch of at least about 14 nucleotides, preferably at least about 65%, more preferably at 
least about 75%, and most preferably at least about 90%. See, Kanehisa, 1984. The length of 
homology comparison, as described, may be over longer stretches, and in certain embodiments 
will often be over a stretch of at least about nine nucleotides, usually at least about 20 

15 nucleotides, more usually at least about 24 nucleotides, typically at least about 28 nucleotides, 
more typically at least about 32 nucleotides, and preferably at least about 36 or more 
nucleotides. 

Nucleic acid hybridization will be affected by such conditions as salt concentration, 
temperature, or organic solvents, in addition to the base composition, length of the 

20 complementary strands, and the number of nucleotide base mismatches between the hybridizing 
nucleic acids, as will be readily appreciated by those skilled in the art. Stringent temperature 
conditions will generally include temperatures in excess of 30°C, typically in excess of 37°C, 
and preferably in excess of 45°C. Stringent salt conditions will ordinarily be less than 1000 mM, 
typically less than 500 mM, and preferably less than 200 mM. However, the combination of 

25 parameters is much more important than the measure of any single parameter. See, e.g., Wetmur 
and Davidson, 1968. 

Probe sequences may also hybridize specifically to duplex DNA under certain conditions 
to form triplex or other higher order DNA complexes. The preparation of such probes and 
suitable hybridization conditions are well known in the art. 
30 The terms "substantial homology" or "substantial identity", when referring to 

polypeptides, indicate that the polypeptide or protein in question exhibits at least about 30% 
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identity with an entire naturally-occurring protein or a portion thereof, usually at least about 70% 
identity, and preferably at least about 95% identity. 

Homology, for polypeptides, is typically measured using sequence analysis software. 
See, e.g., the Sequence Analysis Software Package of the Genetics Computer Group, University 

5 of Wisconsin Biotechnology Center, 910 University Avenue, Madison, Wisconsin 53705. 
Protein analysis software matches similar sequences using measures of homology assigned to 
various substitutions, deletions and other modifications. Conservative substitutions typically 
include substitutions within the following groups: glycine, alanine; valine, isoleucine, leucine; 
aspartic acid, glutamic acid; asparagine, glutamine; serine, threonine; lysine, arginine; and 

1 0 phenylalanine, tyrosine. 

"Substantially similar function" refers to the function of a modified nucleic acid or a 
modified protein, with reference to the wild-type HPC1 nucleic acid or wild-type HPC1 
polypeptide. The modified polypeptide will be substantially homologous to the wild-type HPC1 
polypeptide and will have substantially the same function. The modified polypeptide may have 

15 an altered amino acid sequence and/or may contain modified amino acids. In addition to the 
similarity of function, the modified polypeptide may have other useful properties, such as a 
longer half-life. The similarity of function (activity) of the modified polypeptide may be 
substantially the same as the activity of the wild-type HPC1 polypeptide. Alternatively, the 
similarity of function (activity) of the modified polypeptide may be higher than the activity of 

20 the wild-type HPC1 polypeptide. The modified polypeptide is synthesized using conventional 
techniques, or is encoded by a modified nucleic acid and produced using conventional 
techniques. The modified nucleic acid is prepared by conventional techniques. A nucleic acid 
with a function substantially similar to the wild-type HPC1 gene function produces the modified 
protein described above. 

25 A polypeptide "fragment," "portion" or "segment" is a stretch of amino acid residues of 

at least about five to seven contiguous amino acids, often at least about seven to nine contiguous 
amino acids, typically at least about nine to 13 contiguous amino acids and, most preferably, at 
least about 20 to 30 or more contiguous amino acids. 

The polypeptides of the present invention, if soluble, may be coupled to a solid-phase 

30 support, e.g., nitrocellulose, nylon, column packing materials (e.g., Sepharose beads), magnetic 
beads, glass wool, plastic, metal, polymer gels, cells, or other substrates. Such supports may take 
the form, for example, of beads, wells, dipsticks, or membranes. 
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"Target region" refers to a region of the nucleic acid which is amplified and/or 
detected. The term "target sequence" refers to a sequence with which a probe or primer will 
form a stable hybrid under desired conditions. 

The practice of the present invention employs, unless otherwise indicated, conventional 
5 techniques of chemistry, molecular biology, microbiology, recombinant DNA, genetics, and 
immunology. See, e.g., Maniatis et a!., 1982; Sambrook et al y 1989; Ausubel et aL, 1992; 
Glover, 1985; Anand, 1992; Guthrie and Fink, 1991. A general discussion of techniques and 
materials for human gene mapping, including mapping of human chromosome 1, is provided, 
e.g., in White and Lalouel, 1 988, 

10 

Preparation of recombinant or chemically synthesized nucleic acids: 
vectors, transformation, host cells 

Large amounts of the polynucleotides of the present invention may be produced by 

replication in a suitable host cell. Natural or synthetic polynucleotide fragments coding for a 

1 5 desired fragment will be incorporated into recombinant polynucleotide constructs, usually DNA 
constructs, capable of introduction into and replication in a prokaryotic or eukaryotic cell. 
Usually the polynucleotide constructs will be suitable for replication in a unicellular host, such 
as yeast or bacteria, but may also be intended for introduction to (with and without integration 
within the genome) cultured mammalian or plant or other eukaryotic cell lines. The purification 

20 of nucleic acids produced by the methods of the present invention is described, e.g., in 
Sambrook et al, 1989 or Ausubel et al 9 1992. 

The polynucleotides of the present invention may also be produced by chemical 
synthesis, e.g., by the phosphoramidite method described by Beaucage and Carruthers, 1981 oi 
the triester method according to Matteucci and Caruthers, 1981, and may be performed on 

25 commercial, automated oligonucleotide synthesizers. A double-stranded fragment may be 
obtained from the single-stranded product of chemical synthesis either by synthesizing the 
complementary strand and annealing the strands together under appropriate conditions or by 
adding the complementary strand using DNA polymerase with an appropriate primer sequence. 
Polynucleotide constructs prepared for introduction into a prokaryotic or eukaryotic host 

30 may comprise a replication system recognized by the host, including the intended polynucleotide 
fragment encoding the desired polypeptide, and will preferably also include transcription and 
translational initiation regulatory sequences operably linked to the polypeptide encoding 
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segment. Expression vectors may include, for example, an origin of replication or autonomously 
replicating sequence (ARS) and expression control sequences, a promoter, an enhancer and 
necessary' processing information sites, such as ribosome-binding sites, RNA splice sites, 
polyadenylation sites, transcriptional terminator sequences, and mRNA stabilizing sequences. 

5 Secretion signals may also be included where appropriate, whether from a native HPC1 protein 
or from other receptors or from secreted polypeptides of the same or related species, which allow 
the protein to cross and/or lodge in cell membranes, and thus auain its functional topology, or be 
secreted from the cell. Such vectors may be prepared by means of standard recombinant 
techniques well known in the art and discussed, for example, in Sambrook et al, 1989 or 

10 Ausubele/o/. 1992. 

An appropriate promoter and other necessary vector sequences will be selected so as to 
be functional in the host, and may include, when appropriate, those naturally associated with 
HPC1 genes. Examples of workable combinations of cell lines and expression vectors are 
described in Sambrook et al, 1989 or Ausubel et al, 1992; see also, e.g., Metzger et al, 1988. 

1 5 Many useful vectors are known in the art and may be obtained from such vendors as Stratagene, 
New England BioLabs, Promega Biotech, and others. Promoters such as the trp, lac and phage 
promoters, tRNA promoters and glycolytic enzyme promoters may be used in prokaryotic hosts. 
Useful yeast promoters include promoter regions for metallothionein, 3-phosphoglycerate kinase 
or other glycolytic enzymes such as enolase or glyceraldehyde-3-phosphate dehydrogenase, 

20 enzymes responsible for maltose and galactose utilization, and others. Vectors and promoters 
suitable for use in yeast expression are further described in Hitzeman et al, EP 73,675A. 
Appropriate non-native mammalian promoters might include the early and late promoters from 
SV40 (Fiers et al, 1978) or promoters derived from murine Moloney leukemia virus, mousse 
tumor virus, avian sarcoma viruses, adenovirus II, bovine papilloma virus or polyoma. In addi- 

25 tion, the construct may be joined to an amplifiable gene (e.g., DHFR) so that multiple copies of 
the gene may be made. For appropriate enhancer and other expression control sequences, see 
also Enhancers and Eukaryotic Gene Expression, Cold Spring Harbor Press, Cold Spring 
Harbor, New York (1983). See also, e.g., U.S. Patent Nos. 5,691,198; 5,735,500; 5,747,469 and 
5,436,146. 

30 While such expression vectors may replicate autonomously, they may also replicate by 

being inserted into the genome of the host cell, by methods well known in the art. 
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Expression and cloning vectors will likely contain a selectable marker, a gene encoding a 
protein necessary for survival or growth of a host cell transformed with the vector. The presence 
of this gene ensures growth of only those host cells which express the inserts. Typical selection 
genes encode proteins that a) confer resistance to antibiotics or other toxic substances, e.g. 

5 ampicillin, neomycin, methotrexate, etc.; b) complement auxotrophic deficiencies, or c) supply 
critical nutrients not available from complex media, e.g., the gene encoding D-alanine racemase 
for Bacilli. The choice of the proper selectable marker will depend on the host cell, and 
appropriate markers for different hosts are well known in the art. 

The vectors containing the nucleic acids of interest can be transcribed in vitro, and the 

10 resulting RNA introduced into the host cell by well-known methods, e.g., by injection (see, 
Kubo et ai 9 1988), or the vectors can be introduced directly into host cells by methods well 
known in the art, which vary depending on the type of cellular host, including electroporation; 
transfection employing calcium chloride, rubidium chloride, calcium phosphate, DEAE-dextran, 
or other substances; microprojectile bombardment; lipofection; infection (where the vector is an 

15 infectious agent, such as a retroviral genome); and other methods. See generally, Sambrook et 
at, 1989 and Ausubel et al\ 1992. The introduction of the polynucleotides into the host cell by 
any method known in the art, including, inter alia, those described above, will be referred to 
herein as "transformation." The cells into which have been introduced nucleic acids described 
.above are meant to also include the progeny of such cells. 

20 Large quantities of the nucleic acids and polypeptides of the present invention may be 

prepared by expressing the HPC1 nucleic acids or portions thereof in vectors or other expression 
vehicles in compatible prokaryotic or eukaryotic host cells. The most commonly used 
prokaryotic hosts are strains of Escherichia coli, although other prokaryotes, such as Bacillus 
subtilis or Pseudomonas may also be used. 

25 Mammalian or other eukaryotic host cells, such as those of yeast, filamentous fungi, 

plant, insect, or amphibian or avian species, may also be useful for production of the proteins of 
the present invention. Propagation of mammalian cells in culture is per se well known. See, 
Jakoby and Pastan, 1979. Examples of commonly used mammalian host cell lines are VERO 
and HeLa cells, Chinese hamster ovary (CHO) cells, and WI38, BHK, and COS cell lines. An 

30 example of a commonly used insect cell line is SF9. However, it will be appreciated by the 
skilled practitioner that other cell lines may be appropriate, e.g., to provide higher expression, 
desirable glycosylation patterns, or other features. 
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Clones are selected by using markers depending on the mode of the vector construction. 
The marker may be on the same or a different DNA molecule, preferably the same DNA 
molecule. In prokaryotic hosts, the transformant may be selected, e.g., by resistance to 
ampicillin, tetracycline or other antibiotics. Production of a particular product based on 

5 temperature sensitivity may also serve as an appropriate marker. ' 

Prokaryotic or eukaryotic cells transformed with the polynucleotides of the present 
invention will be useful not only for the production of the nucleic acids and polypeptides of the 
present invention, but also, for example, in studying the characteristics of HPC1 polypeptides. 

Antisense polynucleotide sequences are useful in preventing or diminishing the 

10 expression of the HPC1 locus, as will be appreciated by those skilled in the art. For example, 
polynucleotide vectors containing all or a portion of the HPC1 locus or other sequences from the 
HPC1 region (particularly those flanking the HPC1 locus) may be placed under the control of a 
promoter in an antisense orientation and introduced into a cell. Expression of such an antisense 
construct within a cell will interfere with HPC1 transcription and/or translation and/or 

15 replication. 

The probes and primers based on the HPC1 gene sequences disclosed herein are used to 
identify homologous HPC1 gene sequences and proteins in other species. These HPC1 gene 
sequences and proteins are used in the diagnostic/prognostic, therapeutic and drug screening 
methods described herein for the species from which they have been isolated. 

20 

Methods of Use: Nucleic Acid Diagnosis and Diagnostic Kits 

In order to detect the presence of an HPC1 allele predisposing an individual to cancer, a 
biological sample such as blood is prepared and analyzed for the presence or absence of 
susceptibility alleles of HPC1. In order to detect the presence of neoplasia, the progression 

25 toward malignancy of a precursor lesion, or as a prognostic indicator, a biological sample of the 
lesion is prepared and analyzed for the presence or absence of mutant alleles of HPC1. Results 
of these tests and interpretive information are returned to the health care provider for 
communication to the tested individual. Such diagnoses may be performed by diagnostic 
laboratories, or, alternatively, diagnostic kits are manufactured and sold to health care providers 

30 or to private individuals for self-diagnosis. 

Initially, the screening method involves amplification of the relevant HPC1 sequences. In 
another preferred embodiment of the invention, the screening method involves a non-PCR based 
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strategy. Such screening methods include two-step label amplification methodologies that are 
well known in the art. Both PCR and non-PCR based screening strategies can detect target 
sequences with a high level of sensitivity. 

The most popular method used today is target amplification. Here, the target nucleic acid 

5 sequence is amplified with polymerases. One particularly preferred method using polymerase- 
driven amplification is the polymerase chain reaction (PCR). The polymerase chain reaction and 
other polymerase-driven amplification assays can achieve over a million-fold increase in copy 
number through the use of polymerase-driven amplification cycles. Once amplified, the resulting 
nucleic acid can be sequenced or used as a substrate for DNA probes. 

10 When the probes are used to detect the presence of the target sequences (for example, in 

screening for cancer susceptibility), the biological sample to be analyzed, such as blood or 
serum, may be treated, if desired, to extract the nucleic acids. The sample nucleic acid may be 
prepared in various ways to facilitate detection of the target sequence; e.g. denaturation, 
restriction digestion, electrophoresis or dot blotting. The targeted region of the analyte nucleic 

15 acid usually must be at least partially single-stranded to form hybrids with the targeting 
sequence of the probe. If the sequence is naturally single-stranded, denaturation will not be 
required. However, if the sequence is double-stranded, the sequence will probably need to be 
denatured. Denaturation can be carried out by various techniques known in the art. 

Analyte nucleic acid and probe are incubated under conditions which promote stable 

20 hybrid formation of the target sequence in the probe with the putative targeted sequence in the 
analyte. The region of the probes which is used to bind to the analyte can be made completely 
complementary to the targeted region of human chromosome 1. Therefore, high stringency 
conditions are desirable in order to prevent false positives. However, conditions of high 
stringency are used only if the probes are complementary to regions of the chromosome which 

25 are unique in the genome. The stringency of hybridization is determined by a number of factors 
during hybridization and during the washing procedure, including temperature, ionic strength, 
base composition, probe length, and concentration of formamide. These factors are outlined in, 
for example, Maniatis et aL, 1982 and Sambrook et a/., 1989. Under certain circumstances, the 
formation of higher order hybrids, such as triplexes, quadraplexes, etc., may be desired to 

30 provide the means of detecting target sequences. 

Detection, if any, of the resulting hybrid is usually accomplished by the use of labeled 
probes. Alternatively, the probe may be unlabeled, but may be detectable by specific binding 
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with a ligand which is labeled, either directly or indirectly. Suitable labels, and methods for 
labeling probes and ligands are known in the art, and include, for example, radioactive labels 
which may be incorporated by known methods (e.g., nick translation, random priming or 
kinasing), biotin, fluorescent groups, chemiluminescent groups (e.g., dioxetanes, particularly 
5 triggered dioxetanes), enzymes, antibodies and the like. Variations of this basic scheme are 
known in the art, and include those variations that facilitate separation of the hybrids to be 
detected from extraneous materials and/or that amplify the signal from the labeled moiety. A 
number of these variations are reviewed in, e.g., Matthews and Kricka, 1988; Landegren et al, 
1988; Mittlin, 1 989; U.S. Patent 4,868,105, and in EPO Publication No. 225,807. 
10 As noted above, non-PCR based screening assays are also contemplated in this invention. 

This procedure hybridizes a nucleic acid probe (or an analog such as a methyl phosphonate 
backbone replacing the normal phosphodiester), to the low level DNA target. This probe may 
have an enzyme covalently linked to the probe, such that the covalent linkage does not interfere 
with the specificity of the hybridization. This enzyme-probe-conjugate-target nucleic acid 
15 complex can then be isolated away from the free probe enzyme conjugate and a substrate is 
added for enzyme detection. Enzymatic activity is observed as a change in color development or 
luminescent output resulting in a 10 3 -10 6 increase in sensitivity. For an example relating to the 
preparation of oligodeoxynucleotide-alkaline phosphatase conjugates and their use as 
hybridization probes see Jablonski et al, 1986. 
20 Two-step label amplification methodologies are known in the art. These assays work on 

the principle that a small ligand (such as digoxigenin, biotin, or the like) is attached to a nucleic 
acid probe capable of specifically binding HPC1. Allele specific probes are also contemplated 
within the scope of this example and exemplary allele specific probes include probes 
encompassing the predisposing or potentially predisposing mutations summarized in Tables 9 
25 and 10 of this patent application. 

In one example, the small ligand attached to the nucleic acid probe is specifically 
recognized by an antibody-enzyme conjugate. In one embodiment of this example, digoxigenin 
is attached to the nucleic acid probe. Hybridization is detected by an antibody-alkaline 
phosphatase conjugate which turns over a chemiluminescent substrate. For methods for labeling 
30 nucleic acid probes according to this embodiment see Martin et al, 1990. In a second example, 
the small ligand is recognized by a second ligand-enzyme conjugate that is capable of 
specifically complexing to the first ligand. A well known embodiment of this example is the 
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biotin-avidin type of interactions. For methods for labeling nucleic acid probes and their use in 
biotin-avidin based assays see Rigby et ai, 1977 and Nguyen et al> 1992. 

It is also contemplated within the scope of this invention that the nucleic acid probe 
assays of this invention will employ a cocktail of nucleic acid probes capable of detecting HPC1. 
5 Thus, in one example to detect the presence of HPC1 in a cell sample, more than one probe 
complementary to HPC1 is employed and in particular the number of different probes is 
alternatively 2, 3, or 5 different nucleic acid probe sequences. In another example, to detect the 
presence of mutations in the HPC1 gene sequence in a patient, more than one probe 
complementary to HPC1 is employed where the cocktail includes probes capable of binding to 

10 the allele-specific mutations identified in populations of patients with alterations in HPC1. In 
this embodiment, any number of probes can be used, and will preferably include probes 
corresponding to the major gene mutations identified as predisposing an individual to prostate 
cancer. Some candidate probes contemplated within the scope of the invention include probes 
that include the allele-specific mutations identified in Tables 9 and 10 and those that have the 

1 5 HPC1 regions corresponding to SEQ ID NOs: 1 -52 both 5' and 3' to the mutation site. 

Methods of Use: Peptide Diagnosis and Diagnostic Kits 

The neoplastic condition of lesions can also be detected on the basis of the alteration of 
wild-type HPC1 polypeptide. Such alterations can be determined by sequence analysis in 

20 accordance with conventional techniques. More preferably, antibodies (polyclonal or 
monoclonal) are used to detect differences in, or the absence of, HPC1 peptides. The antibodies 
may be prepared as discussed above under the heading "Antibodies" and as further shown in 
Examples 12 and 13. Other techniques for raising and purifying antibodies are well known in 
the art and any such techniques may be chosen to achieve the preparations claimed in this 

25 invention. In a preferred embodiment of the invention, antibodies will immunoprecipitate HPC1 
proteins from solution as well as react with HPC1 protein on Western or immunoblots of 
polyacrylamide gels. In another preferred embodiment, antibodies will detect HPC1 proteins in 
paraffin or frozen tissue sections, using immunocytochemical techniques. 

Preferred embodiments relating to methods for detecting HPC1 or its mutations include 

30 enzyme linked immunosorbent assays (ELISA), radioimmunoassays (RIA), immunoradiometric 
assays (IRMA) and immunoenzymatic assays (IEMA), including sandwich assays using 
monoclonal and/or polyclonal antibodies. Exemplary sandwich assays are described by David et 
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al in U.S. Patent Nos. 4,376,110 and 4,486,530, hereby incorporated by reference, and 
exemplified in Example 15. 

Methods of Use: Drug Screening 

5 This invention is particularly useful for . screening compounds by using the HPC1 

polypeptide or binding fragment thereof in any of a variety of drug screening techniques. 

The HPC1 polypeptide or fragment employed in such a test may either be free in 
solution, affixed to a solid support, or borne on a cell surface. One method of drug screening 
utilizes eucaryotic or procaryotic host cells which are stably transformed with, recombinant 

10 polynucleotides expressing the polypeptide or fragment, preferably in competitive binding 
assays. Such cells, either in viable or fixed form, can be used for standard binding assays. One 
may measure, for example, for the formation of complexes between an HPC1 polypeptide or 
fragment and the agent being tested, or examine the degree to which the formation of a complex 
between an HPC1 polypeptide or fragment and a known ligand is interfered with by the agent 

15 being tested. 

Thus, the present invention provides methods of screening for drugs comprising 
contacting such an agent with an HPC1 polypeptide or fragment thereof and assaying (i) for the 
presence of a complex between the agent and the HPC1 polypeptide or fragment, or (ii) for the 
presence of a complex between the HPC1 polypeptide or fragment and a ligand, by methods well 
20 known in the art. In such competitive binding assays the HPCl polypeptide or fragment is 
typically labeled. Free HPCl polypeptide or fragment is separated from that present in a 
protein:protein complex, and the amount of free (i.e., uncomplexed) label is a measure of the 
binding of the agent being tested to HPCl or its interference with HPCl:ligand binding, 
respectively. 

25 Another technique for drug screening provides high throughput screening for compounds 

having suitable binding affinity to the HPCl polypeptides and is described in detail in Geysen, 
PCT published application WO 84/03564, published on September 13, 1984. Briefly stated, 
large numbers of different small peptide test compounds are synthesized on a solid substrate, 
such as plastic pins or some other surface. The peptide test compounds are reacted with HPCl 

30 polypeptide and washed. Bound HPCl polypeptide is then detected by methods well known in 
the art. 
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Purified HPC1 can be coated directly onto plates for use in the aforementioned drug 
screening techniques. However, non-neutralizing antibodies to the polypeptide can be used to 
capture antibodies to immobilize the HPC1 polypeptide on the solid phase. 

This invention also contemplates the use of competitive drug screening assays in which 

5 neutralizing antibodies capable of specifically binding the HPC1 polypeptide compete with a test 
compound for binding to the HPC1 polypeptide or fragments thereof. In this manner, the 
antibodies can be used to detect the presence of any peptide which shares one or more antigenic 
determinants of the HPC1 polypeptide. 

A further technique for drug screening involves the use of host eukaryotic cell lines or 

10 cells (such as described above) which have a nonfunctional HPC1 gene. These host cell lines or 
cells are defective at the HPC1 polypeptide level. The host cell lines or cells are grown in the 
presence of drug compound. The rate of growth of the host cells is measured to determine if the 
compound is capable of regulating the growth of HPC1 defective cells. 

Briefly, a method of screening for a substance which modulates activity of a polypeptide 

15 may include contacting one or more test substances with the polypeptide in a suitable reaction 
medium, testing the activity of the treated polypeptide and comparing that activity with the 
activity of the polypeptide in comparable reaction medium untreated with the test substance or 
substances. A difference in activity between the treated and untreated polypeptides is indicative 
of a modulating effect of the relevant test substance or substances. 

20 Prior to or as well as being screened for modulation of activity, test substances may be 

screened for ability to interact with the polypeptide, e.g., in a yeast two-hybrid system (e.g., 
Bartel et ah, 1993; Fields and Song, 1989; Chevray and Nathans, 1992; Lee et ah, 1995). This 
system may be used as a coarse screen prior to testing a substance for actual ability to modulate 
activity of the polypeptide. Alternatively, the screen could be used to screen test substances for 

25 binding to an HPC1 specific binding partner, or to find mimetics of an HPC1 polypeptide. 

Methods of Use: Rational Drug Design 

The goal of rational drug design is to produce structural analogs of biologically active 
polypeptides of interest or of small molecules with which they interact (e.g., agonists, 
30 antagonists, inhibitors) in order to fashion drugs which are, for example, more active or stable 
forms of the polypeptide, or which, e.g., enhance or interfere with the function of a polypeptide 
in vivo. See, e.g., Hodgson, 1991. In one approach, one first determines the three-dimensional 
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structure of a protein of interest (e.g., HPC1 polypeptide) or, for example, of the HPC1 -receptor 
or ligand complex, by x-ray crystallography, by computer modeling or most typically, by a 
combination of approaches. Less often, useful information regarding the structure of a 
polypeptide may be gained by modeling based on the structure of homologous proteins. An 
5 example of rational drug design is the development of HIV protease inhibitors (Erickson et al, 

1990) . In addition, peptides (e.g., HPC1 polypeptide) are analyzed by an alanine scan (Wells, 

1991) . In this technique, an amino acid residue is replaced by Ala, and its effect on the peptides 
activity is determined. Each of the amino acid residues of the peptide is analyzed in this manner 
to determine the important regions of the peptide. 

10 It is also possible to isolate a target-specific antibody, selected by a functional assay, and 

then to solve its crystal structure. In principle, this approach yields a pharmacore upon which 
subsequent drug design can be based. It is possible to bypass protein crystallography altogether 
by generating anti-idiotypic antibodies (anti-ids) to a functional, pharmacologically active 
antibody. As a mirror image of a mirror image, the binding site of the anti-ids would be 

15 expected to be an analog of the original receptor. The anti-id could then be used to identify and 
isolate peptides from banks of chemically or biologically produced banks of peptides. Selected 
peptides would then act as the pharmacore. 

Thus, one may design drugs which have, e.g., improved HPC1 polypeptide activity or 
stability or which act as inhibitors, agonists, antagonists, etc. of HPC1 polypeptide activity. By 

20 virtue of the availability of cloned HPC1 sequences, sufficient amounts of the HPC1 polypeptide 
may be made available to perform such analytical studies as x-ray crystallography. In addition, 
the knowledge of the HPC1 protein sequence provided herein will guide those employing 
computer modeling techniques in place of, or in addition to x-ray crystallography. 

Following identification of a substance which modulates or affects polypeptide activity, 

25 the substance may be investigated further. Furthermore, it may be Manufactured and/or used in 
preparation, i.e., manufacture or formulation, or a composition such as a medicament, 
pharmaceutical composition or drug. These may be administered to individuals. 

Thus, the present invention extends in various aspects not only to a substance identified 
using a nucleic acid molecule as a modulator of polypeptide activity, in accordance with what is 

30 disclosed herein, but also a pharmaceutical composition, medicament, drug or other composition 
comprising such a substance, a method comprising administration of such a composition 
comprising such a substance, a method comprising administration of such a composition to a 
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patient, e.g., for treatment of prostate cancer, use of such a substance in the manufacture of a 
composition for administration, e.g., for treatment of prostate cancer, and a method of making a 
pharmaceutical composition comprising admixing such a substance with a pharmaceutical^ 
acceptable excipient, vehicle or carrier, and optionally other ingredients. 

5 A substance identified as a modulator of polypeptide function may be peptide or non- 

peptide in nature. Non-peptide "small molecules'' are often preferred for many in vivo pharma- 
ceutical uses. Accordingly, a mimetic or mimic of the substance (particularly if a peptide) may 
be designed for pharmaceutical use. 

The designing of mimetics to a known pharmaceutical^ active compound is a known 

10 approach to the development of pharmaceuticals based on a "lead" compound. This might be 
desirable where the active compound is difficult or expensive to synthesize or where it is 
unsuitable for a particular method of administration, e.g., pure peptides are unsuitable active 
agents for oral compositions as they tend to be quickly degraded by proteases in the alimentary 
canal. Mimetic design, synthesis and testing is generally used to avoid randomly screening large 

1 5 numbers of molecules for a target property. 

There are several steps commonly taken in the design of a mimetic from a compound 
having a given target property. First, the particular parts of the compound that are critical and/or 
important in determining the target property are determined. In the case of a peptide, this can be 
done by systematically varying the amino acid residues in the peptide, e.g., by substituting each 

20 residue in turn. Alanine scans of peptide are commonly used to refine such peptide motifs. 
These parts or residues constituting the active region of the compound are known as its 
"pharmacophore". 

Once the pharmacophore has been found, its structure is modeled according to its 
physical properties, e.g., stereochemistry, bonding, size and/or charge, using data from a range 
25 of sources, e.g., spectroscopic techniques, x-ray diffraction data and NMR. Computational 
analysis, similarity mapping (which models the charge and/or volume of a pharmacophore, 
rather than the bonding between atoms) and other techniques can be used in this modeling 
process. 

In a variant of this approach, the three-dimensional structure of the ligand and its binding 
30 partner are modeled. This can be especially useful where the ligand and/or binding partner 
change conformation on binding, allowing the model to take account of this in the design of the 
mimetic. 
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A template molecule is then selected onto which chemical groups which mimic the 
pharmacophore can be grafted. The template molecule and the chemical groups grafted onto it 
can conveniently be selected so that the mimetic is easy to synthesize, is likely to be 
pharmacologically acceptable, and does not degrade in vivo, while retaining the biological 
5 activity of the lead compound. Alternatively, where the mimetic is peptide-based, further 
stability can be achieved by cyclizing the peptide, increasing its rigidity. The mimetic or 
mimetics found by this approach can then be screened to see whether they have the target 
property, or to what extent they exhibit it. Further optimization or modification can then be 
carried out to arrive at one or more final mimetics for in vivo or clinical testing. 

10 

Methods of Use: Gene Therapy 

According to the present invention, a method is also provided of supplying wild-type 
HPC1 function to a cell which carries mutant HPC1 alleles. Supplying such a function should 
suppress neoplastic growth of the recipient cells. The wild-type HPC1 gene or a part of the gene 

15 may be introduced into the cell in a vector such that the gene remains extrachromosomal. In 
such a situation, the gene will be expressed by the cell from the extrachromosomal location. If a 
gene fragment is introduced and expressed in a cell carrying a mutant HPC1 allele, the gene 
fragment should encode a part of the HPC1 protein which is required for non-neoplastic growth 
of the cell. More preferred is the situation where the wild-type HPC1 gene or a part thereof is 

20 introduced into the mutant cell in such a way that it recombines with the endogenous mutant 
HPC1 gene present in the cell. Such recombination requires a double recombination event 
which results in the correction of the HPC1 gene mutation. Vectors for introduction of genes 
both for recombination and for extrachromosomal maintenance are known in the art, and any 
suitable vector may be used! Methods for introducing DNA into cells such as electroporation, 

25 calcium phosphate coprecipitation and viral transduction are known in the art, and the choice of 
method is within the competence of the routineer. Cells transformed with the wild-type HPC1 
gene can be used as model systems to study cancer remission and drug treatments which 
promote such remission. 

As generally discussed above, the HPC1 gene or fragment, where applicable, may be 

30 employed in gene therapy methods in order to increase the amount of the expression products of 
such genes in cancer cells. Such gene therapy is particularly appropriate for use in both 
cancerous and pre-cancerous cells,, in which the level of HPC1 polypeptide is absent or 
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diminished compared to normal cells. It may also be useful to increase the level of expression of 
a given HPC1 gene even in those tumor cells in which the mutant gene is expressed at a 
"normal" level, but the gene product is not fully functional. 

Gene therapy would be carried out according to generally accepted methods, for 
5 example, as described by Friedman, 1991. Cells from a patient's tumor would be first analyzed 
by the diagnostic methods described above, to ascertain the production of HPC1 polypeptide in 
the tumor cells. A virus or plasmid vector (see further details below), containing a copy of the 
HPC1 gene linked to expression control elements and capable of replicating inside the tumor 
cells, is prepared. Suitable vectors are known, such as disclosed in U.S. Patent 5,252,479 and 
10 PCT published application WO 93/07282 and U.S. Patent Nos. 5,691,198; 5,747,469; 5,436,146 
and 5,753,500.. The vector is then injected into the patient, either locally at the site of the tumor 
or systemically (in order to reach any tumor cells that may have metastasized to other sites). If 
the transfected gene is not permanently incorporated into the genome of each of the targeted 
tumor cells, the treatment may have to be repeated periodically. 
15 Gene transfer systems known in the art may be useful in the practice of the gene therapy 

methods of the present invention. These include viral and nonviral transfer methods. A number 
of viruses have been used as gene transfer vectors, including papovaviruses, e.g., SV40 (Madzak 
et al, 1992), adenovirus (Berkner, 1992; Berkner et ah, 1988; Gorziglia and Kapikian, 1992; 
Quantin et al, 1992; Rosenfeld et al, 1992; Wilkinson et ai, 1992; Stratford-Perricaudet et ai, 
20 1990), vaccinia virus (Moss, 1992), adeno-associated virus (Muzyczka, 1992; Ohi et ai, 1990; 
Russell and Hirata, 1998), herpes viruses including HSV and EBV (Margolskee, 1992; Johnson 
et al, 1992; Fink et ai, 1992; Breakfield and Geller, 1987; Freese et al, 1990; Fink et al., 1996), 
lentiviruses (Naldini et al., 1996), Sindbis and Semliki Forest virus (Berglund et al., 1993), and 
retroviruses of avian (Bandyopadhyay and Temin, 1984; Petropoulos et al, 1992), murine 
25 (Miller, 1992; Miller el al, 1985; Sorge et al, 1984; Mann and Baltimore, 1985; Miller et al, 
1988), and human origin (Shimada et al, 1991; Helseth et al, 1990; Page et al, 1990; 
Buchschacher and Panganiban, 1992). Most human gene therapy protocols have been based on 
disabled murine retroviruses. 

Nonviral gene transfer methods known in the art include chemical techniques such as 
30 calcium phosphate cbprecipitation (Graham and van der Eb, 1973; Pellicer et al, 1980); 
mechanical techniques, for example microinjection (Anderson et al, 1980; Gordon et al, 1980; 
Brinster et al, 1981; Constantini and Lacy, 1981); membrane fusion-mediated transfer via 
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liposomes (Feigner et al, 1987; Wang and Huang, 1989; Kaneda et al, 1989; Stewart et al, 
1992; Nabel et al, 1990; Lim et al, 1992); and direct DNA uptake and receptor-mediated DNA 
transfer (Wolff et al, 1990; Wu et al, 1991; Zenke et al, 1990; Wu et al, 1989b; Wolff et al, 
1991; Wagner et al, 1990; Wagner et al, 1991; Cotten et al, 1990; Curiel et al, 1991a; Curiel 
5 et al, 1991b). Viral-mediated gene transfer can be combined with direct in vivo gene transfer 
using liposome delivery, allowing one to direct the viral vectors to the tumor cells and not into 
the surrounding nondividing cells. Alternatively, the retroviral vector producer cell line can be 
injected into tumors (Culver et al, 1992). Injection of producer cells would then provide a 
continuous source of vector particles. This technique has been approved for use in humans with 
10 inoperable brain tumors. 

In an approach which combines biological and physical gene transfer methods, plasmid 
DNA of any size is combined with a polylysine-conjugated antibody specific to the adenovirus 
hexon protein, and the resulting complex is bound to an adenovirus vector. The trimolecular 
complex is then used to infect cells. The adenovirus vector permits efficient binding, 
15 internalization, and degradation of the endosome before the coupled DNA is damaged. For other 
techniques for the delivery of adenovirus based vectors see Schneider et al. (1998) and U.S. 
Patent Nos. 5,691,198; 5,747,469; 5,436,146 and 5,753,500. 

Liposome/DNA complexes have been shown to be capable of mediating direct in vivo 
gene transfer. While in standard liposome preparations the gene transfer process is nonspecific, 
20 localized in vivo uptake and expression have been reported in tumor deposits, for example, 
following direct in situ administration (Nabel, 1992). 

Expression vectors in the context of gene therapy are meant to include those constructs 
containing sequences sufficient to express a polynucleotide that has been cloned therein. In viral 
expression vectors, the construct contains viral sequences sufficient to support packaging of the 
25 construct. If the polynucleotide encodes HPC1, expression will produce HPC1. If the 
polynucleotide encodes an antisense polynucleotide or a ribozyme, expression will produce the 
antisense polynucleotide or ribozyme. Thus in this context, expression does not require that a 
protein product be synthesized. In addition to the polynucleotide cloned into the expression 
vector, the vector also contains a promoter functional in eukaryotic cells. The cloned 
30 polynucleotide sequence is under control of this promoter. Suitable eukaryotic promoters 
include those described above. The expression vector may also include sequences, such as 
selectable markers and other sequences described herein. 
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Gene transfer techniques which target DNA directly to prostate tissues, e.g., epithelial 
cells of the prostate, are preferred. Receptor-mediated gene transfer, for example, is 
accomplished by the conjugation of DNA (usually in the form of covalently closed supercoiled 
plasmid) to a protein ligand via polylysine. Ligands are chosen on the basis of the presence of 

5 the corresponding ligand receptors on the cell surface of the target cell/tissue type. One 
appropriate receptor/ligand pair may include the estrogen receptor and its ligand, estrogen (and 
estrogen analogues), these ligand-DNA conjugates can be injected directly into the blood if 
desired and are directed to the target tissue where receptor binding and internalization of the 
DNA-protein complex occurs. To overcome the problem of intracellular destruction of DNA, 

1 0 coinfection with adenovirus can be included to disrupt endosome function. 

The therapy involves two steps which can be performed singly or jointly. In the first 
step, prepubescent females who carry an HPC1 susceptibility allele are treated with a gene 
delivery vehicle such that some or all of their mammary ductal epithelial precursor cells receive 
at least one additional copy of a functional normal HPC1 allele. In this step, the treated 

15 individuals have reduced risk of prostate cancer to the extent that the effect of the susceptible 
allele has been countered by the presence of the normal allele. In the second step of a preventive 
therapy, predisposed young females, in particular women who have received the proposed gene 
therapeutic treatment, undergo hormonal therapy to mimic the effects on the prostate of a full 
term pregnancy. 

20 

Methods of Use: Peptide Therapy 

Peptides which have HPC1 activity can be supplied to cells which carry mutant or 
missing HPC1 alleles. Protein can be produced by expression of the cDNA sequence in bacteria, 
for example, using known expression vectors. Alternatively, HPC1 polypeptide can be extracted 

25 from HPC1 -producing mammalian cells. In addition, the techniques of synthetic chemistry can 
be employed to synthesize HPC1 protein. Any of such techniques can provide the preparation of 
the present invention which comprises the HPC1 protein. Preparation is substantially free of other 
human proteins. This is most readily accomplished by synthesis in a microorganism or in vitro. 
Active HPC1 molecules can be introduced into cells by microinjection or by use of 

30 liposomes, for example. Alternatively, some active molecules may be taken up by cells, actively 
or by diffusion. Extracellular application of the HPC1 gene product may be sufficient to affect 
tumor growth. Supply of molecules with HPC1 activity should lead to partial reversal of the 
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neoplastic state. Other molecules with HPC1 activity (for example, peptides, drugs or organic 
compounds) may also be used to effect such a reversal. Modified polypeptides having substantially 
similar function are also used for peptide therapy. 

5 Methods of Use: Transformed Hosts 

Similarly, cells and animals which carry a mutant HPC1 allele can be used as model 
systems to study and test for substances which have potential as therapeutic agents. The cells 
are typically cultured epithelial cells. These may be isolated from individuals with HPC1 
mutations, either somatic or germline. Alternatively, the cell line can be engineered to carry the 

1 0 mutation in the HPC1 allele, as described above. After a test substance is applied to the cells, the 
neoplastically transformed phenotype of the cell is determined. Any trait of neoplastically 
transformed cells can be assessed, including anchorage-independent growth, tumorigenicity in 
nude mice, invasiveness of cells, and growth factor dependence. Assays for each of these traits 
are known in the art. 

15 Animals for testing therapeutic agents can be selected after mutagenesis of whole 

animals or after treatment of germline cells or zygotes. Such treatments include insertion of 
mutant HPC1 alleles, usually from a second animal species, as well as insertion of disrupted 
homologous genes. Alternatively, the endogenous HPC1 gene(s) of the animals may be 
disrupted by insertion or deletion mutation or other genetic alterations using conventional 

20 techniques (Capecchi, 1989; Valancius and Smithies, 1991; Hasty et al, 1991; Shinkai et al, 
1992; Mombaerts et al, 1992; Philpott et al, 1992; Snouwaert et al, 1992; Donehower et al, 
1992) to produce knockout or transplacement animals. A transplacement is similar to a 
knockout because the endogenous gene is replaced, but in the case of a transplacement the 
replacement is by another version of the same gene. After test substances have been 

25 administered to the animals, the phenotype must be assessed. If the test substance prevents or 
suppresses the disease, then the test substance is a candidate therapeutic agent for the treatment 
of disease. These animal models provide an extremely important testing vehicle for potential 
therapeutic products. 

In one embodiment of the invention, transgenic animals are produced which contain a 
30 functional transgene encoding a functional HPC1 polypeptide or variants thereof. Transgenic 
animals expressing HPC1 transgenes, recombinant cell lines derived from such animals and 
transgenic embryos may be useful in methods for screening for and identifying agents that 
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induce or repress function of HPC1. Transgenic animals of the present invention also can be 
used as models for studying indications such as cancer. 

In one embodiment of the invention, a HPC1 transgene is introduced into a non-human 
host to produce a transgenic animal expressing a human or murine HPC1 gene. The transgenic 
5 animal is produced by the integration of the transgene into the genome in a manner that permits 
the expression of the transgene. Methods for producing transgenic animals are generally 
described by Wagner and Hoppe (U.S. Patent No. 4,873,191; which is incorporated herein by 
reference), Brinster et al 1985; which is incorporated herein by reference in its entirety) and in 
"Manipulating the Mouse Embryo; A Laboratory Manual" 2nd edition (eds., Hogan, 
10 Beddington, Costantimi and Long, Cold Spring Harbor Laboratory Press, 1994; which is 
incorporated herein by reference in its entirety). 

It may be desirable to replace the endogenous HPC1 by homologous recombination 
between the transgene and the endogenous gene; or the endogenous gene may be eliminated by 
deletion as in the preparation of "knock-out" animals. Typically, a HPC1 gene flanked by 
1 5 genomic sequences is transferred by microinjection into a fertilized egg. The microinjected eggs 
are implanted into a host female, and the progeny are screened for the expression of the 
transgene. Transgenic animals may be produced from the fertilized eggs from a number of 
animals including, but not limited to reptiles, amphibians, birds, mammals, and fish. Within a 
particularly preferred embodiment, transgenic mice are generated which overexpress HPC1 or 
20 express a mutant form of the polypeptide. Alternatively, the absence of a HPC1 in " knock-out" 
mice permits the study of the effects that loss of HPC1 protein has on a cell in vivo. Knock-out 
mice also provide a model for the development of HPC1 -related cancers. 

Methods for producing knockout animals are generally described by Shastry (1995, 
1998) and Osterrieder and Wolf (1998). The production of conditional knockout animals, in 
25 which the gene is active until knocked out at the desired time is generally described by Feil et al. 
(1996), Gagneten et al. (1997) and Lobe and Nagy (1998). Each of these references is 
incorporated herein by reference. 

As noted above, transgenic animals and cell lines derived from such animals may find 
use in certain testing experiments. In this regard, transgenic animals and cell lines capable of 
30 expressing wild-type or mutant HPC1 may be exposed to test substances. These test substances 
can be screened for the ability to reduce overepression of wild-type HPC1 or impair the 
expression or function of mutant HPC 1 . 
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Pharmaceutical Comnositions and Routes of A dministration 

The HPC1 polypeptides, antibodies, peptides and nucleic acids of the present invention 
can be formulated in pharmaceutical compositions, which are prepared according to conventional 
pharmaceutical compounding techniques. See, for example, Remington's Pharmaceutical 
5 Sciences , 1 8th Ed. ( 1 990, Mack Publishing Co., Easton, PA). The composition may contain the 
active agent or pharmaceutical^ acceptable salts of the active agent. These compositions may 
comprise, in addition to one of the active substances, a pharmaceutically acceptable excipient, 
carrier, buffer, stabilizer or other materials well known in the art. Such materials should be non- 
toxic and should not interfere with the efficacy of the active ingredient. The carrier may take a 
10 wide variety of forms depending on the form of preparation desired for administration, e.g., 
intravenous, oral, intrathecal, epineural or parenteral. 

For oral administration, the compounds can be formulated into solid or liquid 
preparations such as capsules, pills, tablets, lozenges, melts, powders, suspensions or emulsions. 
In preparing the compositions in oral dosage form, any of the usual pharmaceutical media may 
1 5 be employed, such as, for example, water, glycols, oils, alcohols, flavoring agents, preservatives, 
coloring agents, suspending agents, and the like in the case of oral liquid preparations (such as, 
for example, suspensions, elixirs and solutions); or carriers such as starches, sugars, diluents, 
granulating agents, lubricants, binders, disintegrating agents and the like in the case of oral solid 
preparations (such as, for example, powders, capsules and tablets). Because of their ease in 
20 administration, tablets and capsules represent the most advantageous oral dosage unit form, in 
which case solid pharmaceutical carriers are obviously employed. If desired, tablets may be 
sugar-coated or enteric-coated by standard techniques. The active agent can be encapsulated to 
make it stable to passage through the gastrointestinal tract while at the same time allowing for 
passage across the blood brain barrier. See for example, WO 96/1 1698. 
25 For parenteral administration, the compound may be dissolved in a pharmaceutical 

carrier and administered as either a solution or a suspension. Illustrative of suitable carriers are 
water, saline, dextrose solutions, fructose solutions, ethanol, or oils of animal, vegetative or 
synthetic origin. The carrier may also contain other ingredients, for example, preservatives, 
suspending agents, solubilizing agents, buffers and the like. When the compounds are being 
30 administered intrathecally, they may also be dissolved in cerebrospinal fluid. 

The active agent is preferably administered in a therapeutically effective amount. The 
actual amount administered, and the rate and time-course of administration, will depend on the 
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nature and severity of the condition being treated. Prescription of treatment, e.g. decisions on 
dosage, timing, etc., is within the responsibility of general practitioners or specialists, and 
typically takes account of the disorder to be treated, the condition of the individual patient, the 
site of delivery, the method of administration and other factors known to practitioners. 

5 Examples of techniques and protocols can be found in Remington 's Pharmaceutical Sciences . 

Alternatively, targeting therapies may be used to deliver the active agent more 
specifically to certain types of cell, by the use of targeting systems such as antibodies or cell 
specific ligands. Targeting may be desirable for a variety of reasons, e.g. if the agent is 
unacceptably toxic, or if it would otherwise require too high a dosage, or if it would not 

1 0 otherwise be able to enter the target cells. 

Instead of administering these agents directly, they could be produced in the target cell, 
e.g; in a viral vector such as described above or in a cell based delivery system such as described 
in U.S. Patent No. 5,550,050 and published PCT application Nos. WO 92/19195, WO 94/25503, 
WO 95/01203, WO 95/05452, WO 96/02286, WO 96/02646, WO 96/40871, WO 96/40959 and 

15 WO 97/12635, designed for implantation in a patient. The vector could be targeted to the 
specific cells to be treated, or it could contain regulatory elements which are more tissue specific 
to the target cells. The cell based delivery system is designed to be implanted in a patient's body 
at the desired target site and contains a coding sequence for the active agent. Alternatively, the 
agent could be administered in a precursor form for conversion to the active form by an 

20 activating agent produced in, or targeted to, the cells to be treated. See for example, EP 
425,731 A and WO 90/07936. 

The present invention is described by reference to the following Examples, which are 
offered by way of illustration and are not intended to limit the invention in any manner. 
25 Standard techniques well known in the art or the techniques specifically described below were 
utilized. 

EXAMPLE 1 

Ascertain and Study Kindreds Likely to Have a 
30 Chromosome 1 -Linked Prostate Cancer Susce ntibilitv Locus 

Extensive cancer prone kindreds were ascertained from a defined population providing a 

large set of extended kindreds with multiple cases of prostate cancer and many relatives 
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available to study. The large number of meioses present in these large kindreds provided the 
power to detect whether the HPC1 locus was segregating, and increased the opportunity for 
informative recombinants to occur within the small region being investigated. This vastly 
improved the chances of establishing linkage to the HPC1 region, and greatly facilitated the 
5 reduction of the HPC1 region to a manageable size, which permits identification of the HPC1 
locus itself. 

Each kindred was extended through all available connecting relatives, and to all 
informative first degree relatives of each proband or cancer case. For these kindreds, additional 
- prostate cancer cases and individuals with cancer at other sites of interest (e.g., bladder) who 
10 also appeared in the kindreds were identified through the tumor registry linked files. All 
prostate cancers reported in the kindred which were not confirmed in the Utah Cancer Registry 
were verified. Medical records or death certificates were obtained for confirmation of all 
cancers. Each key connecting individual and all informative individuals were invited to 
participate by providing a blood sample from which DNA was extracted. We also sampled 
15 spouses, siblings, and offspring of deceased cases so that the genotype of the deceased cases 
could be inferred from the genotypes of their relatives. 

Each of the Utah pedigrees studied represents the descendants of a single founder for 
whom a significant excess of prostate cancer cases was observed among all descendants. Since 
all affected descendants are studied, the resulting kindreds represent a collection of both closely 
20 and distantly related prostate cancer cases. The criteria for selection of kindreds to analyze for 
HPC1 linkage were: 1) genotypes available, or inferable, for 6 or more prostate cancer cases, 
and 2) at least 3 genotyped cases within a second degree of relationship to another genotyped 
case. 

The Utah kindreds are 5 - 7 generations deep, and contain between 8 and 29 prostate 
25 cancer cases. They are all Caucasian of Northern European ancestry. The median age-of-onset 
for each kindred ranged from 64 to 76, similar to that estimated for the general population. Five 
percent of cases were diagnosed before age 55. 

For each kindred analyzed, the number of prostate cancer cases, the median age and 
range of age-of-onset, and the number of cases and family members sampled and" included in this 
30 analysis are detailed in Table 1. The kindreds labeled A-E in Table 1 are the kindreds used for 
the data which are shown in Table 3. > • 
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Table I 



The 29 Utah Kindreds 







Total 


Age-of-Onsct 


Case 


Total 




Kindred 


Cases* 


Range 


Median 


Samples+ 


Samples'* 


5 


1 


9 


57-80 


73 


4 


12 




2 


8 


44-91 


66 


7 


24 




3 


8 


56-79 


76 


4 


20 




4 


18 


56-88 


74 


8 


40 




■ 5 


19 


57-88 


73 


7 


49 


10 


6 


7 


60-88 


73 


14 


81 




8 


16 


46-84 


76 


1 


39 




9 


13 


61-82 


76 


5 


51 




10 (A) 


15 


55-88 


71 


7 


31 




11 


16 


60-82 


70 


8 


34 


15 


12 (B) 


14 


56-85 


73 


9 


41 




13 


14 


50-88 


68 


6 


29 




14 


10 


51-85 


68 


4 


34 




15 


11 


45-85 


68 


4 


30 




16 


17 


44-84 


71 


11 


41 


20 


17 (C) 


11 


44-86 


66 


7 


87 




18 


11 


47-81 


70 


5 


22 




19 


14 


Jt-oO 


79 


5 


21 






c 

o 


62-81 


71 


5 


12 




21 


12 


45-83 


71 


4 


21 


25 


22 


11 


58-91 


76 


7 


25 




23 


8 


51-84 


64 


3 


16 




24 


21 


54-87 


65 


15 


41 




25 (D) 


8 


56-78 


68 


4 


34 




26 


8 


60-77 


70 


3 


29 


30 


27 


11 


62-87 


67 


7 


37 




28 (E) 


10 


53-86 


67 


5' 


26 




29 


11 


45-86 


0 


6 


14 




Totals 


368 






190 


959 



♦Total affected individuals in the genotyped portion of the kindred. 

+Total affected individuals genotyped for the three markers 

A Total individuals genotyped for the three markers (includes affected samples). 

40 EXAMPLE 2 

Selection of Kindreds Which are Linked to Chromosome 1 and 
Localization of HPC1 to the Interval mM.GAAA158:23.4 - m M.GA57el5.S16 

Nuclear pellets were extracted from 16 ml of ACD blood, and DNA extracted with 

phenol and chloroform, precipitated with ethanol, and resuspended in Tris-EDTA. The markers 
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used for genotyping were short tandem repeat (STR) loci at lq24-25 which flanked the most 
likely HPC1 location as indicated in Smith et al. (1996). The order of markers designated by D 
followed by 1 S followed by a sequential number and approximate intervals between them (in 
centiMorgans) is: 

5 D1S2883 -10.8 centiMorgans • D1S254 - 11.5 centiMorgans - D1S412. 

The most likely location as suggested in Smith et al. (1996) is at D1S254. 

Amplification of 20 ng genomic DNA was performed according to standard PCR 
procedures, with minor modifications to optimize product clarity, in a total reaction mix of 10 
ml. Radiolabeled PCR products were electrophoresed on standard 6% polyacrylamide denaturing 
10 sequencing gels. Gels were then dried and autoradiographed. A total of over 200 prostate 
cancer cases and approximately 800 of their relatives were genotyped for the markers. 

In the kindreds which showed evidence of segregation, up to an additional 35 markers 
were used to identify and confirm segregation of multiple linked markers (haplotypes). These 
markers were spaced throughout the 28.7 cM region between D1S452 (proximal to D1S2883) 
15 and D1S422 (distal to D1S412), a region which flanks the three originally typed markers by 3.6 
cM distally, and 2.8 cM proximally. 

Two-point linkage analysis was performed with the package LINKAGE (Lathrop et al., 
1984; 1985) using the FASTLINK implementation (Cottingham et al., 1993; Schaffer et al., 
1994). The statistical analysis for the inheritance of susceptibility to prostate cancer used the 
20 model described in Smith et al. (1996). This model assumed a rare autosomal dominant 
' susceptibility locus and allowed for a 1 5% sporadic rate of prostate cancer. Marker allele frequencies 
were estimated from unrelated individuals present in the kindreds. 

Linkage in the presence of heterogeneity was assessed by the admixture test (A-test) of 
Ott (1986). HOMOG, which postulates two family types, linked and unlinked, was used. 
25 Multipoint linkage analysis was performed using VITESSE (O'Connell et al., 1 995). The size of 
the pedigrees and the lack of genotyping of the higher generations due to the late age-of-onset, 
made more-than-three-point analyses impossible. The multipoint results in Figure 3 represent a 
walking three-point analysis,' with the disease phenotype placed between each pair of adjacent 
markers in all intervals but the exterior ones, in which the two closest markers were used. 
30 The two-point Lod scores for the 29 kindreds combined were highly negative at the 3 

markers examined (Table 2), suggesting an overall lack of evidence for this susceptibility locus 
across all kindreds. Heterogeneity analysis of the three loci showed weak, non-significant 
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evidence for one locus, explaining 5% of the pedigrees. The positive Lod score observed for 
Dl S254 in analysis of heterogeneity, as well as the low estimate of alpha reported in Smith et al. 
(1996) suggested that there might be a subset of linked pedigrees within our data set. We 
examined three marker haplotypes in each kindred for evidence of a shared region among 
5 affecteds. For those kindreds which suggested such segregation, we genotyped samples for up to 
an additional 35 markers. In Table 2 these kindreds and their Lod scores for the 3 markers are 
shown. 

Multipoint linkage results are depicted in Figure 3. This analysis resulted in a maximum 
heterogeneity Lod score of +1.20 at D1S254 with an estimate that 5% of kindreds were linked. 

10 Multipoint heterogeneity analysis in the most likely interval excluded linkage (Lod scores less 
than -2.00) for alpha greater than 0.33. 

Cancers of sites other than prostate would also be expected to occur in individuals in 
these kindreds. Some individuals hypothesized to be sharing the segregating chromosome 1 
haplotype were affected with cancer at another site. These included stomach cancers at ages 56, 

15 68 and 82, ovarian cancer at age 32, and breast cancer at age 49 in kindred 17; a colon cancer at 
age 87, in kindred 12; and a breast cancer at age 72 and colon cancer at age 79 in kindred 25. 
Lod scores for linkage for a phenotype of cancer of any site did not differ significantly from 
those for prostate alone, although most individuals with cancer of another site were not included 
in the sampling. 

20 
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Table 2 

Total Lod scores and heterogeneity Lod 
scores for 29 Utah high-risk prostate cancer kindreds 



5 r = 0.00 

Marker 

D1S2883 -40.41 
D1S254 -27.18 
D1S412 -46.31 

10 

Marker 

D1S2883 

D1S254 

15 D1S412 



Lod Scores 



0.01 


0.10 


0.20 


0.30 


-33.31 


-11.41 


-3.70 


-0.89 


-21.90 


-6.47 


-1.52 


-0.08 


-37.49 


-12.45 


-3.96 


-1.00 



Heterogeneity 

Lod (r) alpha 

0.004 (.3) 0.10 

0.482 (.0) 0.05 

0.004 (.3) 0.05 



Table 3 

Maximum Lod scores for the 5 Utah kindreds 
20 with evidence of segregation of the three-marker haplotype 



25 





Maximum Lod (r) 




Kindred 


D1S2883 


D1S254 


D1S412 


10 


0.64(.0) 


0.00(.5) 


0.00(.5) 


12 


0.28(.2) 


0.02(3) 


0.00(.5) 


17 


0.43(.0) 


2.04(.0) 


039(.l) 


25 


0.42(.0) 


0.27(.0) 


0.13(.2) 


28 


0.05(.0) 


0.31(.l) 


0.12(.l) 



Maximum Lod score =0.00(.5) indicates no evidence for linkage. 

30 

Analysis model used for Tables 2 and 3: 
Disease gene frequency: 0.003 
Unaffecteds age < 75 years: 
unknown phenotype 
35 Unaffecteds age >= 75 years: 

non carrier genotype disease penetrance = 0.16 
carrier genotype disease penetrance = 0.63 
Affecteds: 

non carrier genotype disease penetrance = 0.00053 
40 carrier genotype disease penetrance = 0.50 
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FX AMPLE 3 
rnnti ff Assembly 

Genomic clone contig assembly in the HPC1 region started from a publicly available 
' integrated map of chromosome 1, the WICGR Chr 1 map of Nov. 19, 1996. YACs located in 
5 the interval between D1S202 and D1S238 were ordered from Genome Systems (Figure 1). 
Primer pairs for the markers located in the interval between D1S202 and D1S238 were 
synthesized and used to screen a BAC library at Myriad. Markers that were negative on that 
BAC library were used to screen the BAC and PAC libraries at Genome Systems. DNA preps 
were prepared from the BACs and PACs that contained these markers. End sequences were 
10 obtained by dye terminator sequencing with vector primers on ABI 377 sequencers. Primer 
pairs defining BAC or PAC end markers were designed from these sequences. These new 
markers were checked against the YACs to make sure that they mapped within the interval. If 
the map data were ambiguous, the markers were also checked against a radiation hybrid panel. 
These new markers were checked against the already identified BACs/PACs to determine the 
15 positions of these clones relative to each other. The outside markers from each clone contig 
were used.to screen the Myriad BAC library; those that were negative on that BAC library, were 
used to screen the BAC and PAC libraries at Genome Systems. Repeated cycles of library 
screening and marker development allowed us to build a BAC/PAC contig that spanned the 
minimal recombinant interval. 
20 As shown in our physical map of the HPC1 locus (Figure 1), a 15 clone BAC/PAC 

contig spans the interval between D1S202 and D1S238. Based on the genetic data described in 
detail above, the HPC1 locus must lie in the interval between the marker mM.GAAA158j23.4 
and mM:GA57el5.S6. This interval is spanned by a 10 clone BAC/PAC contig. Based on the 
sizes and map positions of the YACs in. the region, the sizes of these BACs and PACs in the 
25 contig and extensive sequencing of those BACs and PACs, we estimate the size of the minimal 
genetically defined interval containing HPC1 to be 750 kb. 

EXAMPLE 4 
Genomic sequencing 

30 Two different types, of genomic sequencing sublibraries were prepared from BAC or 

PAC clones in the candidate region. 
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Random-Sheared Sequencing Sub-libraries BAC or PAC DNA was sheared by 
sonication. To generate blunt-ended fragments, the sonicated DNA was incubated with 
mung-bean nuclease (Pharmacia Biotech) followed by treatment with a Pfu polishing kit 
(Stratagene). The DNA fragments were size fractionated on a 0.8% TAE agarose gel, and 
5 fragments in the size range of 1 .0 - 1 .6 kb were excised under longwave (365 nm) ultraviolet 
light. The excised gel slice was rotated 180 degrees relative to the original direction of 
electrophoresis and then placed into a new gel tray containing 1.0% GTG-Seaplaque 
low-melting temperature agarose (FMC corporation) before the gel solidified. Electrophoresis 
was repeated for the same time and voltage as the first run, resulting in a concentration of the 

10 DNA fragments in a small volume of agarose, and the gel slice containing the DNA fragments 
was once again excised from the gel. The DNA fragments were purified from the agarose by 
incubating the gel slice with beta-agarose (New England Biolabs), followed by removal of the 
agarose monomers using disposable microconcentrators (Amicon) that employ a 50,000 Daltons 
molecular weight cutoff filter. DNA fragments were ligated into the Hinc II site of the plasmid 

15 pMYG2, a pBluescript (Stratagene) derivative where the polylinker has been replaced by the 
pMYG2 polylinker. The vector was prepared by digestion with Hindi followed by 
dephosphorylation with calf alkaline phosphatase (Boehringer Mannheim). 



Table 4 

20 Cloning Sites in pMYGl and pMYG2 

Name Sequence Sequence ID# 

pMYG2 polylinker AT GACCATAGTCGACCTGGCCGTCGTT 55 
pMYGl polylinker ATGACCATAGTCGACGGATCCGTCGACCTG 56 
GCCGTCGTT 



Ligated products were transformed into DH5-alpha E. coli competent cells (Life 
Technologies, Inc.) and plated on LB plates containing ampicillin, IPTG, and Bluo-gal (Sigma; 

25 Life Technologies, Inc.) . White colonies were used to inoculate individual wells of 1 ml 
96-well microtiter plates (Beckman) containing 200 microliters of LB media supplemented with 
ampicillin at 50 micrograms per milliliter. The plates were incubated for 16-20 hours in a 
shaking incubator at 37 degrees Celsius. After incubation, 20 microliters of dimethyl sulfoxide 
was added to each well and the plates stored frozen. The inserts of random-sheared clones were 

30 amplified from E. coli cultures by PCR with vector primers, and the PCR products were 
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sequenced with M13 forward or reverse fluorescent energy transfer (FET) dye-labeled primers 
on ABI 377 sequencers. 

Sail 3A Sequencing Sub-libraries: BAC or PAC DNA was partially digested with the 
restriction enzyme Sau 3 A, and fragments in the size range of 5-8 kb were size fractionated and 

5 recovered from the agarose gel as described above for random-sheared fragments. Sau3A 
fragments were ligated into the Bam HI site of pMYGl, a pBluescript (Stratagene) derivative 
where the polylinker has been replaced by the pMYGl polylinker. The vector was prepared by 
digestion with Bam HI and dephosphorylation with shrimp alkaline phosphatase (Amersham). 
The ligated products were transformed and plated as described above for random-sheared clones. 

10 To identify clones containing inserts in the size range of 5-8 kb, bacterial colonies were 

screened using a plasmid preparation procedure that has been adapted for use in a 96-well 
format. White colonies were picked into individual wells of 2 ml 96-well plates (Continental 
Laboratory Products) containing 1 ml LB media supplemented with 200 micrograms per 
milliliter ampicillin. The plates were incubated 16-20 hours in a shaking incubator at 37 degrees 

15 Celsius. A bacterial stock of these clones was prepared by transferring 100 microliters of the 1 
ml cultures to another 96-well plate containing 200 microliters of LB media supplemented, with 
ampicillin. The remaining cells were pelleted by centrifugation and the pellets resuspended in 
200 microliters of LB media. One hundred microliters of the concentrated cells were transferred 
to a 96-well thermowell PCR plate (Costar), and the cells were once again pelleted. The pelleted 

20 cells were resuspended in lysis buffer [250 mM Tris-HCl, pH 8.0, 50 mM EDTA, pH 8.0, 8% 
sucrose, 5% Triton X-100, 1 mM tartrazine, and 666 micrograms per milliliter lysozyme], and 
the plates were covered with thermowell lids (Costar) and incubated in a MJ Research 
thermocycler for 2 minutes at 100 degrees Celsius followed by 2 minutes at 25 degrees Celsius. 
Cell debris was pelleted by centrifugation, and 15 microliters of the supernatant containing the 

25 plasmid DNA was electrophoresed on a 0.6x TBE 0.8% agarose gel with appropriate supercoiled 
size standards to estimate the size of each clone. 

The bacterial stocks of clones with inserts in the 5-8 kb size range were used to inoculate 
3 ml cultures of LB media supplemented with ampicillin, which were incubated overnight in a 
shaking incubator at 37 degrees Celsius. Plasmid DNA was prepared from these cultures using 

30 the Autogen robotic plasmid preparation machine (Integrated Separation Systems). The 
resulting DNA templates are subjected to DNA sequencing from both ends with M13 forward or 
reverse fluorescent energy transfer (FET) dye-labeled primers on ABI 377 sequencers. 
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DNA sequencing gel files were examined for lane tracking accuracy and adjusted where 
necessary before data extraction. AB1 sample files resulting from gel files were converted to the 
Standard Chromatogram Format (SCF) [Dear and Staden] and trimmed of sequencing vector 
(pMYGl or pMYG2). Trimmed sequences were assembled using Acem.bly (Thierry-Mieg et 

5 al, 1995; Durbin and Thierry-Mieg, 1991). Contiguous sequence resulting from automatic 
assembly was screened for residual vector sequence (both sequencing vector and cloning vector) 
as well as for bacterial contamination using BLAST (Altschul et al, 1990). 

Remaining sequences were arranged according to the relative position and orientation of 
assembled Sau3Al partial digest clone sequence reads as well as sequence similarity to 

10 overlapping genomic clones. Repetitive sequence was masked from the sequence contigs using 
xblast (Claverie and States, 1993). These masked sequences were placed in a Genetic Data 
Environment (GDE) (Smith et a!., 1994) local database for subsequent similarity searches. 
Similarities among genomic DNA sequences and hybrid-selected cDNA clones as well as 
GenBank entries-both DNA and protein-were identified using BLAST. - DNA sequences were 

15 also characterized with respect to short period repeats, CpG content, and long open reading 
frames. 

EXAMPLE 5 
Hvhrid selection 

20 Two distinct methods of hybrid selection were used in this work. 

Method 1: cDNA preparation and selection. Poly (A) enriched RNA from human 
mammary gland, prostate, testis, fetal brain, and placenta tissues and from total RNA of the cell 
line Caco-2 (ATCC HTB 37) were reverse transcribed using the tailed random primer RXGN6 
and M-MLV Reverse Transcriptase (Life Technologies, Inc.). First strand cDNA was poly(A) 

25 tailed, 2nd strand synthesis was primed with the oligo RXGTn , and then the ds cDNA was 
expanded by amplification with the primer RXG. Hybrid selection was carried out for two 
consecutive rounds of hybridization to immobilized BAC, PAC or gel purified YAC DNA as 
described previously. [Parimoo et al, 1991; Rommens et aL 1994]. Individual gel purified 
YACs or groups of two to four overlapping BAC and/or PAC clones were used in individual 

30 selection experiments. Hybridizing cDNA was collected, passed over a G50 Fine Sephadex 
column and amplified using tailed primers. The products were then digested with EcoRl, size 
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selected on agarose gels, and ligated into pBluescript (Stratagene) that had been digested with 
EcoRl and treated with calf alkaline phosphatase (Boehringer Mannheim). Ligation products 
were transformed into competent DH5a E. coli cells (Life Technologies, Inc.). 

Characterization of Retrieved cDNAs. 200 to 300 individual colonies from each ligation 

5 (from each 250 kbases of genomic DNA) were picked and gridded into microtiter plates for 
ordering and storage. Cultures were replica transferred onto Hybond N membranes (Amersham) 
supported by LB agar with ampicillin. Colonies were allowed to propagate and were 
subsequently lysed with standard procedures. Initial analysis of the cDNA clones involved a 
prescreen for ribosomal sequences and subsequent cross screenings for detection of overlap and 

10 redundancy. 

Approximately 10-25% of the clones were eliminated as they hybridized strongly with 
radiolabeled cDNA obtained from total RNA. Plasmids from 25 to 50 clones from each 
selection experiment that did not hybridize in prescreening were isolated for further analysis. 
The retrieved cDNA fragments were verified to originate from individual starting genomic 
1 5 clones by hybridization to restriction digests of DN As of the starting clones, of a hamster hybrid 
cell line that contains chromosome 1 as its only human material, and to human genomic DNA. 
The clones were tentatively assigned into groups based on the overlapping or non-overlapping 

intervals of the genomic clones. 

Method 2: cDNA Preparation. Poly(A) enriched RNA from human mammary gland, 
20 fetal brain, lymphocyte, pancreas, prostate, stomach, and thymus were reverse-transcribed using 
the tailed random primer XN12 and Superscript II reverse transcriptase (Gibco BRL). After 
second strand synthesis and end polishing, the ds cDNA was purified on Sepharose CL-4B 
columns (Pharmacia). cDNAs were "anchored" by ligation of a double-stranded oligo RP (RP-2 
annealed to RL-1) to their 5' ends (5' relative to mRNA) using T4 DNA ligase. Anchored ds 
25 cDNA was then repurified on Sepharose CL-4B columns. 

Selection was performed by a modified procedure of Lovett et al. (1 991). cDNAs from 
mammary gland, fetal brain, lymphocyte, pancreas, prostate, stomach, and thymus tissues were 
first expanded by amplification using a nested version of RP. RP.A and XPCR, and purified by 
fractionation on Sepharose CL-4B. Selection probes were prepared from purified Pis, BACs or 
30 PACs by digestion with Hinfl and Exonuclease III. The single-stranded probe was 
photolabelled with photobiotin (Gibco BRL) according to the manufacturer's recommendations. 
Probe, cDNA and C 0 t-1 DNA and poly A DNA were hybridized in 2.4M TEA-C1, lOmM 
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NaP04, ImM EDTA. Hybridized cDNAs were captured on streptavidin-paramagnetic particles 
(Dynal), eluted, and reamplified with a further nested version of RP, RP.B and XPCR, and gel 
purified. The selected, amplified cDNA was hybridized with an additional aliquot of probe, C 0 t- 
1 DNA and poly A DNA. Captured and eluted products were amplified again with RP.B and 
XPCR, size-selected by gel electrophoresis, and cloned into dephosphorylated Hindi cut 
pUC 1 8. Ligation products were transformed into XL2-Blue ultra-competent cells (Stratagene). 

Both methods: Insert-containing clones were identified by blue/white selection on Xgal 
or Bluo-gal plates. Inserts were amplified by colony PCR with vector primers and then 
sequenced on AB1 377 sequencers. Alignment of these cDNA sequences to corresponding 
genomic sequences, and parsing of the revealed exons across those genomic sequences, allowed 
initial characterization of genes located within the region. 



Table 5 

Oligonucleotides Used for Hybrid Selection 

15 Name Sequence 

RXGN 6 5 '-CGG AATTCTGCAGATCTA'B'CN NNNNN 

RXGT I2 5 ' -CG G A ATTCTGC AG ATCTITTTTTTTTTT 

RXG 5 ' -CG G AATTCTGC AG ATCT 

XN )2 5'-(NH2)-GTAGTGCAAGGCTTXiAGAAChW 

RP-2 5 ' -(NH2)-TG AGTAGAATTCTAACGGCCGTC ATTGTTC 

RL- 1 5 ' -G AACAATGACGGCCGTTAG AATTCTACTC A-(NH2) 

RP.A 5 '-TG AGTAGAATTCTAACGGCCGTC AT 

XPCR 5'-(P04)-GTAGTGCAAGGCTCGAGAAC 

RP.B 5'-(P04)-TGAGTAGAATTCTAACGGCCGTCATTG 



Sequence ID# 
57 
58 
59 
60 
61 
62 
63 
64 
65 



EXAMPLE 6 

Inter-exon PCR and RACE for the identification 
20 of new exons (S\ 3'. or internal) of th e HPC1 gene 

Inter-exon PCR: Following sequence analysis of the first three hybrid selected clones 

that originated from HPC 1 , several primers were designed to try to amplify HPC 1 products from 

fetal brain, breast, pancreas, prostate, stomach, and thymus cDNAs. Two important pieces of 

data were revealed by this experiment: (1) The transcript is not abundant, but it was considerably 

25 more abundant in prostate and thymus cDNA than in the other tissues tested. (2) The transcript 

is subject to a complex pattern of alternative splicing. Specifically, amplification from 1 ng of 
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prostate and 1 ng of thymus cDNA with the primers 07CG01#F1 and 07CG01#BR1 (see Table 
6) and TaqPlus DNA polymerase (Stratagene) revealed more than 8 distinct splice variants of the 
HPC1 transcript. Amplification was by hot start PCR; conditions used were an initial 
denaturation step at 95°C for 30 sec followed by a pause at 80°C while the polymerase/ 
5 nucleotide mixture was added to the template/primer mixtures. The hot start was followed by 35 
cycles of denaturation at 96°C (4 s), annealing at 60°C (10 s) and extension at 72°C (60 s). The 
bands representing these splice variants were plugged, reamplified, and sequenced using dye 
, terminator chemistry on ABI 377 sequencers. Parsing of these cDNA sequences across the 
genomic sequence of HPC1 revealed several new exons. 
10 5' RACE: The 5' end exons of HPC1 were identified by a modified RACE protocol 

called biotin capture 5' RACE (Tavtigian et al., 1996). Poly(A) enriched RNA from prostate 
was reverse-transcribed using the tailed random primer XN]2 and Superscript II reverse 
transcriptase (Gibco BRL). After second strand synthesis and end polishing, the ds cDNA was 
purified on Sepharose CL-4B columns (Pharmacia). cDNAs were "anchored" by ligation of a 
1 5 double-stranded oligo RP (RP-2 annealed to RL-1 ) to their 5 ? ends (5' relative to mRNA) using 
T4 DNA ligase. Anchored ds cDNA was then repurified on Sepharose CL-4B columns. 

The 5' sequences of HPC1 were amplified using two primer combinations: (1) 
biotinylated reverse primer 07CG01#BR1 (See Table 6) and RP.A, and (2) biotinylated reverse, 
primer 07CG01#BR2 (See Table 6) and RP.A. PCR products were fractionated on an agarose 
20 gel, gel purified, and captured on streptavidin-paramagnetic particles (Dynal). Material captured 
after amplification with 07CG01#BR1 and RP.A was reamplified using the nested 
phosphorylated reverse primer 07CG01#PR2 and a further nested version of RP-2, RP.B. 
Material captured after amplification with 07CG01#BR2 and RP.A was reamplified using the 
nested phosphorylated reverse primer 07CG01#PR3 and RP.B. These PCR reactions gave 
25 several bands on an agarose gel; the PCR products were gel purified and sequenced in the 
reverse direction, using primer 07CG01#PR2 and/or 07CG01#PR3 with dye terminator 
chemistry on an ABI 377 sequencer. 

3' RACE: A 3' end exon of HPC1 was identified by a modified RACE protocol called 
biotin capture 3" RACE. Poly(A) enriched RNA from prostate was reverse-transcribed using the 
30 tailed random primer XT] 5 and Superscript II reverse transcriptase (Life. Technologies). The 
first strand (heteroduplex) cDNA was purified by fractionation on a Sepharose CL-6B column. 
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The 3' sequence of HPC1 was amplified with the biotinylated forward primer 
07CG01#BF4 and the anchor primer XPCR. PCR products amplified with these primers were 
fractionated on an agarose gel, gel purified, and captured on streptavidin-paramagnetic particles 
(Dynal). Captured material was reamplified using the nested phosphorylated forward primer 
5 07CG01#PF5andXT4. 

PCR products were gel purified, ligated into the vector pMYG2, and transformed into 
DH5a cells. Colony PCR products were sequenced using primer 07CG01#PF5 and XPCR 
using dye terminator chemistry on an AB1 377 sequencer. 

10 Table 6 

Oligonucleotides Used for RACE 



Name Sequence Sequence 1D# 

XT 15 5'-(NH2> 66 

GTAGTGCAAGGCTCGAGAACTTTTTTTTTTTTTTT 

XT 4 5 , -(P04)-GTAGTGCAAGGCTCGAGAACTTTT 67 

XN I2 5'-(NH2)- 68 

GTAGTGCAAGGCTCGAGAAC>nWNNNNNNNNN 

RP-2 5'-(NH2)-TGAGTAGAATTCTAACGGCCGTCATTGTTC 69 

RL-1 5'-GAACAATGACGGCCGTTAGAATTCTACTCA-(NH2) 70 

RP.A 5'-TGAGTAGAATTCTAACGGCCGTCAT 71 

XPCR 5'-(P04)-GTAGTGCAAGGCTCGAGAAC 72 

RP.B 5'-(P04)-TGAGTAGAATTCTAACGGCCGTCATTG 73 

07CG01#F1 5'-AGG AAG TAT ATC TAA GTC ACC TCC A 74 

07CG01#BR1 5'-(Biotin)-AA TTC CAG ACA GAT TGC AGG CAC 75 

07CG0 1 #PR2 5'-(P04)-AG AGG ACT TGT TCC CCA TAA TTG 76 

07CG01#BR2 5'-(Biotin)-AG AGG ACT TGT TCC CCA TAA TTG 77 

07CG01#PR3 5 , -(P04)-TT ACG GCT ACT GGA GGT GAC TTA 78 

07CG0 1 #PR4 5'-(P04)-AA GTC TCC AGG GCA CAT CTG A 79 

07CG01#BF4 5'-(Biotin)-GAAGAAAGAACACTCAGATGTGC v . 80 

07CG01#PF5 5 '-(P04)-CG AAGG A A AGCTTCC AATTATG 81 



EXAMPLE 7 

15 cDNA library screening 

Radioactive probes prepared from two hybrid selected clones representative of HPC1 - 
transcripts (mH179ol2-4B03 and mH179ol2-3B06) were used as probes to screen a total of 5.5 
x 10 6 recombinant phage from a human prostate Xgtll cDNA library (HL1131b, Clontech). 
Prehybridization and hybridization was performed at 42°C in 50% formamide, 5X SSPE, 0.1%. 

20 SDS, 5X Denhardt's mixture, 0.2 mg/ml denatured salmon sperm DNA and 2 mg/ml poly (A). 
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Dextran sulfate (4% v/v) was included in the hybridization solution only. The filters were rinsed 
in 2X SSC for 10 minutes at room temperature and then rinsed in 2X SSC/0.1% SDS for 30 
minutes at 60°C followed by two washes in IX SSC/0.1% SDS for 20 minutes each at 60°C. 
The positive phage were retested for second and third screenings, as required, to obtain purified 
5 plaques for sequencing. Inserts were amplified by phage PCR with vector primers and then 
sequenced using dye terminator chemistry on ABI 377 sequencers. 

EXAMPLE 8 
Mutation screening 

1 0 Both genomic DN A and cDNA were used as templates for mutation screening. 

Genomic DNA: Using genomic DNAs from prostate kindred members, prostate cancer 
affecteds, and tumor cell lines as templates, nested PCR amplifications were performed to 
generate PCR products of the candidate genes that were screened for HPC1 mutations. The 
primers listed in Table 7 were used to produce amplicons of the HPC1 gene. Using the. outer 

15 primer pair for each exon (FA-RP, i.e., forward A and reverse P), 1-10 ng of genomic DNA were 
subjected to a 23-26 cycle primary amplification, after which the PCR products were diluted 60- 
fold and reamplified using nested M13-tailed primers (FB-RQ, FC-RR or FB-RR) for another 
20-25 cycles; either TaqPlus (Stratagene) or AmpliTaq Gold (Perkin Elmer) was used in the 
PCRs. In general, the PCR conditions used were an initial denaturation step at 95°C for 1 min 

20 (TaqPlus) or 10 min (AmpliTaq Gold), followed by cycles of denaturation at 96°C (12 s), 
annealing at 55°C (15 s) and extension at 72°C (45-60 s). PCR products were sequenced with 
Ml 3 forward or reverse fluorescent energy transfer (FET) dye-labeled primers on ABI 377 
sequencers. Chromatograms were analyzed for the presence of polymorphisms or sequence 
aberrations in either the Macintosh program Sequencer (Gene Codes) or the Java program 

25 Mutscreen (Myriad, proprietary). 
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Table 7 

for Mutation Screening from Genomic DNA 



Mut 

Amplicon 



ame 



Sequence 



SEQ ID 

NO: 



CA7.CG1. 
ml 



CA7.CG1. 
m2 



CA7.CG1 
m3 



CA7.CG1 
m4 



ca7.CG01. ml A 

ca7.CG01. ml A.a 
ca7.CG01. m1P 
ca7.CG01. m1B 

ca7.CG01. mlB.a 

ca7.CG01. m1Q 

ca7.CG01. mlQ.a 

CA7.CG01.m2A 

CA7.CG01.m2 P 
CA7.CG01.m2B 

CA7.CG01.m2Q 

CA7.CG01.m3A 

CA7.CG01.m3P 
CA7.CG01.m3B 

CA7.CG01.m3Q 

ca7.CG01.4A 
ca7.CG01.4P 



GTA ATG AAA TCT GAG AAG CTG AA 

CAC ACA GTG GTT AAT CAT AAA TAC 
CAC AAA GGT ATC TTT TAA GTT CC 
GTT TTC CCA GTC ACG ACG GAA GCT 
GAA TTT AGC AAT ACA GA 
GTT TTC CCA GTC ACG ACG TTA TCT 
GTT CAC TTC ACC TTT G 
AGG AAA CAG CTA TGA CCA TCC TGA 
GCT TTC AAA AAA GTA TTC 
AGG AAA CAG CTA TGA CCA TGG TCT 
CA CTT TTC ATT TAC TTC 

TAG CAT TGT TTG AAG CCA CAG 

CTG GAA GAA ACC TGT AAC TTG 
GTT TTC CCA GTC ACG ACG TGA AGC 
CAC AGA GTT TTA GAG 
AGG AAA CAG CTA TGA CCA TTG TTC 
TCA AAT AAT GTC CCA AA 

GTA ATG CTA TAA TGT TTG AAA GG 

TTC AGG CTA ACT TCC ATC TTC 
GTT TTC CCA GTC ACG ACG GGT TAC 
CCC AAC ATA CCT ATG 
AGG AAA CAG CTA TGA CCA TAA ATA 
GCA TAC ATA ATG TTT ATT C 

CAA AGA GTA TGG GAG GCT GA 

ACT TCA GAG AAC AAC TTC GTC C 



82 

83 
84 
85 

86 

87 

88 

89 

90 . 
91 

92 

93 

94 

95 

96 

97 
98 
99~~ 
100 



101 

102 
103 



CA7.CG1 
m5 



ca7.CG01.4B 
ca7.CG01.4Q 

ca7cg1.m5 A 

ca7cg1.m5 P 
ca7cg1.m5 B 



GTT TTC CCA GTC ACG ACG GGC TGA 
GAC TGA CTT GAC TAT T 
AGG AAA CAG CTA TGA CCA TGA GGG 
TCC ATG AGG CTT C 

GTG AAT GGC TAG ATC CCC TTT 

AAT GAA CCT ACA GTG AGG CAG 
GTT TTC CCA GTC ACG ACG AAA GAC 
AAC CAC TCT AAT GTG C 
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CA7.CG1. 
Im6 



CA7.CG1 
Im7 



PCR+seq 
PCR 

CA7.CG1 
Im8 



ca7cg1.m5 Q 

ca7cg1.m6 A 

ca7cg1.m6 P 
ca7cg1.m6 B 

ca7cg1.m6 Q 

ca7cg1.m6 C 

ca7cg1.m6R 

ca7cg1.m7 A 

ca7cg1.m7 P 
ca7cg1.m7 B 

ca7cg1.m7Q 

m7f2.ca7cg1 
FAM-m7r1.ca7cg1 

ca7cg1.m8 A 



AGG AAA CAG CTA TGA CCA TGT TCT 1 04 
TTT ACA TCT TAA CCC AG 

TCT AGT CAG CCT TCT TGA AC 1 05 

GAC GTA ACA GCT AAA ACG AA 106 

GTT TTC CCA GTC ACG ACG CCT TCT 1 07 
TGA ACT AGA ACT TG 

AGG AAA CAG CTA TGA CCA TCA GGG 1 08 
TTT ATC CTT ATG AA 

GTT TTC CCA GTC ACG ACG TCA CAT 1 09 
GCT CAA AAT CTA AA 

AGG AAA CAG CTA TGA CCA TAA GGC 1 1 0 
AATCTTTCC AGT G 

CTG AAT TGG GGT TTG TCT TG 111 

AAA GAA AGC AGA ACC TTA GC 112 

GTT TTC CCA GTC ACG ACG TTC TCC 1 1 3 
TTA CCA TTA GAG CA 

AGG AAA CAG CTA TGA CCA TAT AGG 1 1 4 
TGG CCT TGT TAT GTA 

TTC TCC TTA CCA TTA GAG CAC 1 1 5 

[FAMJ-CC TTC GGA TTT GTT CAA GTC 116 

CCA TTT GCC TAA TGA ATG AA 117 



CA7.CG1 
m9 



ca7cg1.m8P 
ca7cg1.m8 B 

ca7cg1.m8 Q 

ca7cg1.m8 C 

ca7cg1.m8 R 

ca7cg1.A 

ca7cg1.P 
ca7cg1 .B 

ca7cg1.Q 

ca7cg1.C 

ca7cg1.R 



GTC AGA AAA TCT TGG GTGTA 118 

GTT TTC CCA GTC ACG ACG CTT AAG 119 
AAA GAG ATT GCC A 

AGG AAA CAG CTA TGA CCA TGC AAT 120 

GTG GTA TTA CAA CTT A 

GTT TTC CCA GTC ACG ACG AAA ATA 121 

AGC TGT CTC TGA AG 

AGG AAA CAG CTA TGA CCA TGG GTG 1 22 
TAA AAT AAT TTC TGG 

CGT CTT ACT CAG TTT TGT ATT CT 1 23 

CAT CTA GAA GTA TGC ATT TGG TA 1 24 

GTT TTC CCA GTC ACG ACG TGA ATC 125 

TTA TTT TCT GCA AGG C 

AGG AAA CAG CTA TGA CCA TTC AAA 1 26 

TAA GGT ATA AAG ACA GAG I 

GTT TTC CCA GTC ACG ACG AAT CCC 1 27 

TGA ATG GAT AGC ACC C 

AGG AAA CAG CTA TGA CCA TAA ATC 1 28 
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ACA AAA ATG TCT AAG GTT 




CA7.CG1. 


ca7cg1.m10A 


CTG AAT CTC CCC TAT TAG AAG T 


129 


m10 










ca7cg1.m10P 


AAG GCC ATT AA GAG GTT CTT AG 


130 




ca7cg1.m10B 


GTT TTC CCA GTC ACG ACG GAG TTA 


131 . 




CAT TCA III IIC GAU 1 0 






ca7cg1.m10Q 


A y\ AAA P A P ""P" A T/"^ A PP A '1 11 P A A 

AGG AAA CAG CTA TGA CCA 1 1 1 CAA 






GAC CAG CCT GAC CAA C 




CA7.CG1. 


ca7cg1.m11A 


TCC CTG TTG AAA TTC CAA CCT 


133 


m11 










ca7cg1.m11P 


CAT AGA AAT TCT CAC CTA CCC A 


134 




ca7cg1.m11B 


GTT TTC CCA GTC ACG ACG CCA AGG 


135 




TP A TP P TAT OTA PAP 

TGA TGG TAT b 1 A bAb 






ca7cg1.m11Q 


a r"\ aaa a ^> PT A TP A P P A TT P T A A 

AGG AAA CAG CTA TGA CCA TTG TAA 






ATG GAT CTT GAA GAT CAT 




CA7.CG1. 


ca7cg1.m12A 


GCA CAG AGC ACA TTC TGG TGA 


137 


m12 










ra7rn1 m12P 


TCC CAA AGA AAA CTA CTA GCC 


138 




ca7cg1.m12B 


GTT TTC CCA GTC ACG ACG CTG ATG 


139 




ATC ACA GTC TCT AAG 






ca7cg1.m.12Q . 


AGG AAA CAG CTA TGA CCA TCC AGC 


A A A 

140 




AAA GTT GTT GTT GGTT 




CA7.CG1. 


ca7.CG01.13A 


AGA CAG TTG GTA TTT AGG GA 


AAA 

141 


m13 










ca7 CG01 13P 


TCA TTA TTG CAT TTT CTG GA 


142 




ca7.CG01.13B 


GTT TTC CCA GTC ACG ACG AGC CAT 


143 






TTT PPT OTP TP P A 

TTT CCT CTC 7CU A 






ca7.CG01.13Q 


AGG AAA CAG CTA TGA CCA TGG GCT 


AAA 

144 






TCT TTT CCA CTT CAA 




CA7.CG1. 


ca7cg1.m14A 


CAA CCA AAC TAT TAT GAA ACC G 


AAC 

145 


m14 










cnlc.a^ m14P 


AGT GGG GAG CCA GTG CTG TTA 


146 




ca7cg1.m14B 


GTT TTC CCA GTC ACG ACG TTA TAA 


147 




TAA TP A PT A PAP ATA 

I AA 1 CA U 1 A oAo A 1 A oo 






ca7cg1.m14Q 


A aaa a ^>T A T/"^ A PP A T A A T^^"T 

AGG AAA CAG CTA TGA CCA TAA TCT 


148 




TGT ATG TTC TCC CAG G 




/"> a "7 P P A 

CA7.CG1. 


ca/cgi .mi oa 


TTr; rvrr; r;PA ftTA GAC TGT GGT 

1 IO OIO Own O 1 P« >-?P w i w i wv-* i 


149 


m15 






150 




ca7cg1.m15P 


GAC AGC TAT TAC TCA AAT GTC A 




ca7cg1.m15B 


GTT TTC CCA GTC ACG ACG TAA GAT 


151 




TTT GCT ACG CAA ACT GT 
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ca7cg1.m15Q 


AGG AAA CAG CTA TGA CCA TAG AGA 
CCC GAG TAA GCA TAG T 


152 


CA7.CG1. 
m16 


ca7cg1.m16A 

ca7cg1.m16P 
ca7cg1.m16B 


TGG ACA AGT CAA TGC ACT ACT G 

TGA TTT AAG CTG CCC AGA TTT C 
GTT TTC CCA GTC ACG ACG TCT TCT 
TTA GTT GAG AGA ACC T 


153 

154 . 
155 




ca7cg1.m16Q 
ca7cg1.m16C 
ca7cg1.m16R 


AGG AAA CAG CTA TGA CCA TGG AGC 

CAT GTT GGG CAC AGT 
GTT TTC CCA GTC ACG ACG ACA GCT 

ATG AAA TAG AAC AGA G 
AGG AAA CAG CTA TGA CCA TGC ATA 

CGT GCA GCA ACA GAG A 


156 
157 
158 


CA7.Col. 

m17 


ca/cgn .mi / a i 

ra7rn1 m17n1 

ca7cg1.m17b1 
ca7cglm17q1 


TTR GTC TCA GAA ATA ATC TTA CTG G 

GGA TGT AGC ACC TTG AAA TCA TTC 
GTT TTC CCA GTC ACG ACG AGC CTA 

TGG ATG TAT TTA TTC AGT TA 
AGG AAA CAG CTA TGA CCA TGT TCC 
ATT CGT TTC CTA TCA TTA G 


159 

160 
161 

162 


ca7cg1.m 
18 


ca7cgl .mloA 


AAA AAA ATC AAT AAT ATG 


163 


ca7cg1.m18B 
ca7cg1.m18Q 


CAT TGC CCA CCT GTC TAA C 
GTT TTC CCA GTC ACG ACG AAG ATT 

GTT AAA TGC TAC TGC 
AGG AAA CAG CTA TGA CCA TTA TCA 
i CTA TTC CCC TTG GC 


164 
165 

166 


ca7cg1.m 
' 19 


ca/cgi.miyA 


(XCXIk ATG TGG AGT AAT GTA AAC 


167 


r»7ra1 m19P 
ca7cg1.m19B 

ca7cg1.m19Q 


CAC CAT GTT GAA ATT AAG CAG 
GTT TTC CCA GTC ACG ACG GTA ATT 

GTT GAT AGT CCT CTG 
AGG AAA CAG CTA TGA CCA TCA TAA 
AAC CAA AGC ATC CG 


168 
169 

170 


ca7cg1.m 
20 


ca7cg1.m20A 


ATT TGC TGT CAC ATT ACC CTG 


171 


ca7cg1.m20P 
ca7cg1.m20B 


CAG CCT GCC TGG GTG ACA G 
GTT TTC CCA GTC ACG ACG TGT CAC 
ATT ACC CTG TTT ATC 


172 
173 




ca7cg1.m20Q 


AGG AAA CAG CTA TGA CCA TTA AGA 
AGA GGT GAT ATT ACT TAC 


174 
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ca7cg1.m 
21 


ca7cg1.m21A 


CTA TTG TAA TGA ATG CTG CTG 


175 


ca7cg1.m21P 
ca7cg1.m21B 

ca7cg1.m21Q 


CAG AAG ATT ATC GTG GTC ATC 
GTT TTC CCA GTC ACG ACG ATC AAG 

TGA CTC CTA ACC CTG 
AGG AAA CAG CTA TGA CCA TCG TG(j 
TCA TCA TAA ACT AAA TAC 


176 
177 

1 (o 


ca7cg1.m 
22 


ca7cg1.m22A 


AAC TTT GAG TCT GTA GGT TGT TC 


179 


ca7cg1.m22P 
ca7cg1.m22B 

ca7cg1.m22Q 

ca7cg1.m22 C1 

ca7cg1.m22 R1 


AGA TGA GCA GCC CAC TAT TG 
GTT TTC CCA GTC ACG ACG CCA TTT 

GTT GAA GAA AAG TTA AG 
AGG AAA CAG CTA TGA CCA TCA GAA 

AAG GCT GGA CAA CTT G 
GTT TTC CCA GTC ACG ACG CAA CTA 

TTC ATC TCT TAT CTA CC 
AGG AAA CAG CTA TGA CCA TTG AGC 

AGC CCA CTA TTG ATT TC 


180 
181 

182 

183 

1o4 


ca7cg1.m 
23 


ca7cg1.m23A 


GAA TGG AAT AAG TTA AAT CTT TG 


185 


ca7cg1.m23P 
ca7cg1.m23B 

ca7cg1.m23Q 


TAT CTG AAA AAC TAA TAA GCC AG 
GTT TTC CCA GTC ACG ACG TTG CTT 

TCT ACT CAG AGT CTA TG 
AGG AAA CAG CTA TGA CCA TAC TAA 

CAT AAT TGG CTA ATG GC 


186 
187 

-too 
1oo 


ca7cg1.m 
24 


ca7cg1.m24A 


CAG GAT TAT ACT TTC ACT CAA G 


189 


ca7cg1.m24P 
ca7cg1.m24B 

ca7cg1.m24Q 


GAC ATT TAA CTT AAT TTC ACT TG 
GTT TTC CCA GTC ACG ACG ATA GAC 

TCA AGA AAA ATG CTA AG 
AGG AAA CAG CTA TGA CCA I C I CC I 
TGT TAT TTC TAA ACC AG 


190 
191 


ca7cg1.m 
25 


ca7cg1.m25A 


TTG TCT ACC TGA ACC CCG AG 


193 


ca7cg1.m25P 
ca7cg1.m25B 

ca7cg1.m25Q 


CAA AAT GGG GCT TGA TTA GG 
GTT TTC CCA GTC ACG ACG TAC CTT 

TCT GTG CGT GAT AGC 
AGG AAA CAG CTA TGA CCA TTT AGG 
GCT CAA ACT GAA ATG G 


194 
195 

196 
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cDNA: Total RNA prepared from either tumor cell lines or prostate kindred 
lymphocytes was treated with DNase I (Boehringer Mannheim) to remove contaminating 
genomic DNA, and then reverse transcribed to heteroduplex cDNA with a mix of Nio random 
primers and a tailed oligo dT primer, and Superscript II reverse transcriptase (Life 
5 Technologies). This cDNA was used as the template for nested PCR amplifications to generate 
the cDNA PCR products of the candidate genes that were screened for HPC1 mutations. Using 
the outer primer pair for each amplicon, 10 ng of cDNA were subjected to a 20 cycle primary 
amplification, after which the PCR products were diluted 100-fold and reamplified using nested 
■M'13-tailed primers for another 25-30 cycles. The cDNAs were amplified by hot start PCRs 
10 using TaqPlus DNA polymerase (Stratagene). Conditions used were an initial denaturation step 
at 95°C for 30 sec followed by a pause at 80°C while the polymerase/nucleotide mixture was 
added to the template/primer mixtures. The hot start was followed by cycles of denaturation at 
96°C (4 s), annealing at 55°C (10 s) and. extension at 72°C (60 s). PCR products were gel 
purified and then sequenced with Ml 3 forward or reverse fluorescent energy transfer (FET) dye- 
1 5 labeled primers on ABI 377 sequencers. The sequences of these products were analyzed in GDE 
to determine their exon structure. Chromatograms were analyzed for the presence of polymorphisms 
or sequence aberrations in either the Macintosh program Sequencher (Gene Codes) or the Java 
program Mutscreen (Myriad, proprietary). 

20 labjej 

Oligonucleotides Used for Mutation Screening from cDNA 
Name Sequence Sequence ID# 

ca7.CG01.13C AGCCATTTTCCTCTCTCCA 197 
ca7.CG01.13D GTTTTCCCAGTCACGACGCCACCACATACCACACTTC 198 

ca7.CG01.1C CAGAATCGCATCAGTAATAGA 199 
ca7.CG01.1D GTTTTCCCAGTCACGACGTGAAGACCTCrTTGAATTATC 200 

ca7.CG01.2R GAAGCTGTGTTCTTTTTTCA 201 

ca7.CG01.2S AGGAAACAGCTATGA(£ATCrGTGTTC 202 
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EXAMPLE 9 
Analysis of HPC1 Mutations 

The DNA samples which were screened for HPC1 mutations were extracted from blood 
or tumor samples from patients with prostate or ovarian cancer (or known carriers by haplotype 

5 analysis) who were participating in research studies on the genetics of prostate cancer. All 
subjects signed appropriate informed consent. 

Role of HPC1 in Cancer . Most tumor predisposition genes identified to date give rise to 
protein products that are absent, nonfunctional, or reduced in function. The majority of TP53 
mutations are missense; some of these have been shown to produce, abnormal p53 molecules that 

10 interfere with the function of the wild-type product (Shaulian et al. 1992; Srivastava et al, 
1993). A similar dominant negative mechanism of action has been proposed for some 
adenomatous polyposis coli (APC) alleles that produce truncated molecules (Su et al, 1993), 
and for point mutations in the Wilms' tumor gene (WT1) that alter DNA binding of the protein 
(Little et al, 1993). 

15 Sequence for HPC1 has been determined. Twenty five exons have been sequenced and 

several of the alternative splice variants of the transcript have been determined. SEQ ID NOs:l- 
52 show the sequence for HPC1 including exons and flanking genomic sequence. The exon 
names in their order in genomic DNA sequence in the direction of transcription of the gene are 
as follows: glm20, g2ml3, g3ml, [g4ml7a, g4ml7b], g5ml9, g6ml8, g7ml0, g8mll, g9m21, 
20 g!0m2, gllm3, gl2ml4, gl3m22, g 14ml2, gl5m4, g 16m23, g 17m24, g 18m25, [gl9m5a, 
gl9m5b] g20m6, g21m7, g 22ml5, g23m9, g24ml6, g25m8 The remaining sequences include 
the same exons plus surrounding intron, e.g., SEQ ID NO:2 includes SEQ ID NO:l within it. 
SEQ ID NOs:7 and 8 are alternate forms of a single exon and both are included within SEQ ID 
NO:9. Similarly, SEQ ID NOs:38 and 39 are alternate forms of a single exon and both are 
25 , included within SEQ ID NO:40. 

Certain rules have been determined concerning splicing. The transcripts must begin with 
the exon glm20 or g2ml3. Only 1 of these two exons is present, e.g., if the exon represented by 
glm20 is present, then the exon represented by g2ml3 is not present. The transcript must 
terminate with one of the exons represented by g4ml7a, g22ml5, g24ml6 or g25m8. Exons 
30 represented by g4ml 7a and g4ml 7b] are alternate forms and only one of these two forms of the 
exon may be present, e.g., if the exon represented by g4ml7a is present then the exon 
represented by g4ml7b is absent. Similarly, exons represented by gl9m5a, gl9m5b are 
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alternate forms and only one of these two forms of the exon may be present, e.g., if the exon 
represented by gl9m5a is present then the exon represented by gl9m5b is absent.. The exons 
represented by g4ml7a, g22ml5, g24ml6 have poly A sequences. 

In studying the several kindreds, two mutations were found which were associated with 
5 cancer. These are shown in Table 9. 

Also found were two polymorphisms which are in disequilibrium as shown in Table 10. 
These occur at base 207 of SEQ ID.NO:2 and at base 158 of SEQ ID NO:10. 

There are many potential combinations of exons which could be spliced together to form 
an HPC1 transcript. Following the rules described above (as determined from sequencing many 
10 transcripts), examples of the combinations of spliced exons are shown in Table 11. These 
examples are not inclusive of all combinations of exons which have been found and which are 
possible. Many additional combinations exons are determined by applying the above rules. 

SEQ ID NOs:203-210 show some putative polypeptides which are obtained from some 
of the mRN A variants. 

15 
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Table 11 



Exon content 



glm20g4m!7a 
glm20g3ml g4ml7a 

glm20 g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 g 18m25 g!9m5a 
g20m6 g21m7 g23m9 g25m8 

glm20 g3ml g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 gllm3 
g 12ml4 gl3m22 g 14ml2 g 15m4 gl6m23 gl7m24 g 18m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

glm20 g3ml g4ml7b g6ml8 g7ml0 g8mll g9m21 gl0m2 gllm3 
g 12ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

glm20 g3ml g4ml7b g5ml9 g7ml0 g8mll g9m21 g-10m2 gllm3 
g 12ml4 g 13m22 gl4ml2 gl5m4 g 16m23 gl7m24 g 18m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

glm20 g3ml g4ml7b g5ml9 g6ml8 g8mll g9m21 gl0m2 gllm3 
g 12ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g9m21 gl0m2 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll gl0m2 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5a 
g20m6 g2 1 m7 g23m9 g25m8 

glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gllm3 
g 12ml4 gl3m22 g 14ml2 gl5m4 g 16m23 g 17m24 gl8m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gl2ml4 g 13m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 g 14ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 
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14. glm20 g3ml g4ml7b g5ml9 g6m!8 g7ml0 g8mll g9m21 g!0m2 
gllm3 gl2ml4 gl3m22 gl5m4 gl6m23 gl7m24 gl8ra25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

15. g lm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
5 gllm3 gl2ml4 gl3m22 gl4ml2 gl6m23 gl7m24 gl8m25 gl9m5a 

g20m6 g2 1 m7 g23m9 g25m8 

16. glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 g 13m22 gl4ml2 gl5m4 gl7m24 g 18m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

10 17. glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 

gllm3 gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl8m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

18. glm20 g3ml g4ml7b g5ml9 g6ral8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl9m5a 

15 g20m6g21m7g23m9g25m8 

19. g lm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 
g20m6 g2 1 m7 g23m9 g25m8 

20. glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
20 gllm3 gl2ml4 gl3m22 gl4ml2 gl5m4 g!6m23 gl7m24 gl8m25 

g]9m5a g21m7 g23m9 g25m8 

21. glm20 g3ml g4ml7b g5ml9 g6m18 g7ml0 g8mll g9m21 gl0m2 
gllm3 g 12ml4 gl3m22 gl4ml2 gl5m4 g 16m23 g 17m24 g 18m25 
gl 9m5a g20m6 g23ra9 g25m8 

25 22. glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 

gllm3 gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 
gl9m5a g20m6 g21m7 g25m8 

23. glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 ' g 18m25 

30 gl9m5ag20m6g21m7g23m9g25m8 

24. glm20 g4ml7b g5m!9 g6ml8 g7ml0 g8mll g9m21 gl0m2 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 

25. glm20 g3ml g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 gllm3 
35 gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 g!7m24 gl8m25 gl9m5b 

g20m6 g21m7 g23m9 g25m8 

26. glm20 g3ml g4ml7b g6ml8 g7ml0 g8mll g9m21 gl0m2 gllm3 
g 12ml4 gl3m22 g 14ml2 gl5m4 gl6m23 g!7m24 gl8m25 gl9ra5b 
g20m6 g21m7 g23m9 g25m8 

40 27. glm20 g3ml g4ml7b g5ml9 g7ml0 g8mll g9m21 gl0m2 gllm3 

gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 
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28. glm20 g3ml g4ml7b g5ml9 g6ml8 g8mll g9m21 gl0m2 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 

29. glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g9m21 gl0m2 gllm3 
5 gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 g17m24 gl8m25 gl9m5b 

g20m6g21m7g23m9g25m8 - 

30. glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll gl0m2 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 

10 31. glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gllm3 

gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5b 
g20m6g21m7g23m9g25m8 

32. glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5b 

15 g20m6g21m7g23m9g25m8 

33. glm20 g3ml g4ml7b g5ml9 g6m!8 g7ml0 g8mll g9m21 gl0m2 
gllm3 g 13m22 gl4ml2 gl5m4 g 16m23 gl7m24 g 18m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 

34. glm20 g3mL g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
20 gllm3 gl2ml4 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5b 

g20m6 g21m7 g23m9 g25m8 

35. glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 gl3m22 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 

25 36. glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 

gllm3 gl2ml4 gl3m22 gl4ml2 gl6m23 gl7m24 gl8m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 

37. glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 g!3m22 gl4ml2 gl5m4 gl7m24 gl8m25 gl9m5b 

30 g20m6g21m7g23m9g25m8 

38. glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl8m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 

39. glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
35 gllm3 gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl9m5bg20m6 

g21m7g23m9g25m8 

• 40. glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 

gllm3 gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 
g20m6 g21m7 g23m9 g25m8 

40 41. glm20 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 

gllm3 gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 
g!9m5b g21m7 g23m9 g25m8 
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glm20 g3ml g4ml7b g5ml9 g6ml8 g7mlO g8mll g9m21 gl0m2 
gllm3 gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 
gl9m5b g20m6 g23m9 g25m8 

glm20 g3ml g4ml7b g5m19 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 gl3m22 gl4ml2 gl5m4 g 16m23 gl7m24 g 18m25 
g 1 9m5b g20m6 g2 1 m7 g25m8 

glm20 g3ml g4ml7b g5ml9 g6ml8 g7mlO g8mll g9m21 gl0m2 
gllm3 gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 
gl9m5b g20m6 g21m7 g23m9 g25m8 

g2ml3g4ml7a 

g2ml3 g3ml g4ml7a 

g2ml3 g4ml7b g5ml9 g6ml8 g7mlO g8mll g9m21 gl0m2 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 g 18m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 g!6m23 gl7m24 gl8m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g4ml7b g6ml8 g7ml0 g8mll g9m21 gl0m2 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 g 16m23 gl7m24 gl8m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

g 2ml3 g3ml g4ml7b g5ml9 g7mlO g8mll g9m21 g!0m2 gllm3 
g 12ml4 g 13m22 g 14ml2 gl5m4 gl6m23 gl7m24 g 18m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6ml8 g8mll g9m21 gl0m2 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6ml8 g7mlO g9m21 gl0m2 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5a 
g20m6g21m7.g23m9g25m8 

g2ml3 g3ml g4ml7b'g5ml9 g6ml8 g7ml0 g8mll gl0m2 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5a 
g20m6 g2 1 ml g23m9 g25m8 

g 2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 glOm2 
gl2ml4 gl3m22 gl4ml2 gl5m4 g 16m23 gl7m24 gl8m25 gl9ni5a 
g20m6 g21m7 g23m9 g25m8 

g 2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 g 18m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 
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g 2ml3 g3ml g4ml7b g 5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 g 12ml4 g 14ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

g 2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 glOm2 
gllm3 gl2ml4 gl3m22 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

g 2ml3 g3ml g 4ml7b g5ml9 g6ml8 g7mlO g8mll g9m21 gl0m2 
gllm3 gl2ml4 gl3m22 gl4ml2 gl6m23 gl7m24 g 18m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6ml8 g7mlO g8mll g9m21 gl0m2 
gl lm3 g 12ml4 gl3m22 gl4ml2 . gl5m4 gl7m24 g 18m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

g 2ml3 g3ml g4ml7b g5ml9 g6ml8 g7mlO g8mll g9m21 gl0m2 
gllm3 gl2ml4 g 13m22 gl4ml2 gl5m4 g 16m23 g18m25 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 g 13m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl9m5a 
g20m6 g21m7 g23m9 g25m8 

g 2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2m14 gl3m22 gl4ml2. gl5m4 gl6m23 gl7m24 gl8m25 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 g 13m22 gl4ml2 gl5m4 g 16m23 gl7m24 g 18m25 
gl9m5ag21m7g23m9g25m8 

g2ml3 g3ml g4ml7b g5m!9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 gl3m22 g 14ml2 gl5m4 g 16m23 gl7m24 g 18m25 
gl 9m5a g20m6 g23m9 g25m8 

g 2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 g 12ml4 g 13m22 g 14ml2 g 15m4 g 16m23 g 17m24 g 18m25 
g 1 9m5a g20m6 g2 1 m7 g25m8 

g 2ml3 g3ml g4ml7b g5ml9 g6ml8 g7mlO g8mll g9m21 glOm2 
gllm3 gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 g 18m25 
gl9m5a g20m6 g21m7 g23m9 g25m8 

g 2ml3 g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 g 16m23 gl7m24 gl8m25 g!9m5b 
g20m6 g21m7 g23m9 g25m8 

g 2ml3 g3ml g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 g 18m25 g 19m5b 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g4ml7b g6ml8 g7ml0 g8mll g9m21 gl0m2 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 
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g2ml3 g3ml g4ml7b g5ml9 g7mlO g8mll g9m21 gl0m2 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6m!8 g8mll g9m21 gl0m2 gllm3 
g12ml4 g 13m22 gl4ml2 gl5m4 gl6m23 g 17m24 g 18m25 gl9m5b 
g20m6 g2 1 m7 g23m9 g25m8 

g 2ml3 g3ml g4ml7b g5ml9 g6ml8 g7mlO g9m21 gl0m2 gllm3 
gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll gl0m2 gllm3 
g 12ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6ml8 g7mlO g8mll g9m21 gllm3 
gl2ml4 gl3m22 gl4m!2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6ml8 g7mlO g8mll g9m21 gl0m2 
gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 g!0m2 
gllm3 gl3m22 g 14ml2 gl5m4 gl6m23 gl7m24 g 18m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g4m!7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 gl3m22 gl5m4 gl6m23 gl7m24 g 18m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 gl3m22 gl4ml2 gl6m23 gl7m24 gl8m25 g 19m5b 
g20m6 g21m7 g23m9 g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 glOm2 
gllm3 gl2ml4 g 13m22 gl4ml2 gl5m4 g 17m24 g 18m25 g 19m5b 
g20m6 g2 1 ml g23m9 g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 gl3m22 gl4ml2 g!5m4 gl6m23 gl8m25 gl9m5b 
g20m6 g21m7 g23m9 g25m8 

g2m!3 g3ml g4ml7b g5ml9 g6ml8 g7m!0 g8mll g9m21 gl0m2 
gl lm3 gl2ml4 gl3m22 gl4ml2 gl5m4 g 16m23 g 17m24 g 19m5bg20m6 
g21m7g23m9 g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
glim* g 12ml4 gl3m22 gl4ml2 gl5m4 g 16m23 gl7m24 gl8m25 
g20m6 g21m7 g23m9 g25m8 
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g2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 gl3m22 gl4ml2 gl5m4 g 16m23 gl7m24 gl8m25 
gl9m5b g21m7 g23m9 g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 g 12ml4 g 13m22 g 14ml2 gl5m4 gl6m23 gl7m24 g 18m25 
g 1 9m5b g20m6 g23m9 g25m8 

g 2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 gl0m2 
gllm3 gl2ml4 gl3m22 gl4ml2 gl5m4 gl6m23 gl7m24 gl8m25 
gl9m5bg20m6g21m7g25m8 

g2ml3 g3ml g4ml7b g5ml9 g6ml8 g7ml0 g8mll g9m21 g!0m2 
gllm3 g 12ml4 g 13m22 g 14ml2 g 15m4 gl6m23 g 17m24 gl8m25 
gl9m5b g20m6 g21m7 g23m9 g25m8 



EXAMPLE 10 

15 Polymorphisms in HPC1 

In the course of determining the sequence of the HPC1 gene from the many kindred 
samples as well as tumor and cell line samples, two mutation have been found (see Table 9) as 
well as several polymorphisms. As noted previously, two of the polymorphisms which have 
been found to date show linkage disequilibrium. These and other polymorphisms are shown in 
20 Table 12. Frequencies have been tested in controls, tumor cell lines (TCLs) and family members 
of kindreds with cancer. 

Table 12 
Polymorphisms in HPC1 

25 1) A G/A polymorphism is found located at base 68 of SEQ ID NO:3. The number of 
occurrences of the three combinations of polymorphism in the various test groups is: 
Controls TCLs Family Members 

Homozygous G - 78 Homozygous G - 26 Homozygous G - 1 3 
Heterozygous - 34 Heterozygous - 6 Heterozygous - 1 
Homozygous A - 6 Homozygous A - 0 Homozygous A - 0 



30 



2) A C/T polymorphism is found located at base 186 of SEQ ID NO:4. The number of 
occurrences of the three combinations of polymorphism in the various test groups is: 
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Controls 

Homozygous C - 85 
Heterozygous - 29 
Homozygous T- 4 



TCLs 

Homozygous C - 27 
Heterozygous - 3 
Homozygous T- 2 



Family Members 
Homozygous C - 10 
Heterozygous - 4 
Homozygous T- 0 

3) A C/G polymorphism is found located at base 29 of SEQ ID NO: 10. The number of 
occurrences of the three combinations of polymorphism in the various test groups is: 

Controls TCLs Family Members Prostate - Canii 

Homozygous C - 1 1 1 Homozygous C - 3 1 Homozygous C - 14 Not tested 
Heterozygous - 7 Heterozygous - 0 Heterozygous - 0 Not tested 
Homozygous G - 0 Homozygous G - 1 Homozygous G - 0 Not tested 

4) A G/T polymorphism is found located at base 66 of SEQ ID NO:9. The number of 
5 occurrences of the three combinations of polymorphism in the various test groups is: 

Controls TCLs Family Members 

Homozygous G - 1 08 Homozygous G - 3 1 Homozygous G - 10 
Heterozygous - 10 Heterozygous - 0 Heterozygous - 4 
Homozygous T - 0 Homozygous T - 1 Homozygous T - 0 

5) An, AT polymorphism is found located at base 158 of SEQ ID NO: 10. The T allele of this 
polymorphism occurs at a significantly higher frequency in the TCLs than in the controls and is 
thus in disequilibrium with "the state of being a tumor cell line." The number of occurrences of 

1 0 the three combinations of polymorphism in the various test groups is: 
Controls TCLs Family Members 

Homozygous A - 97 Homozygous A - 20 Homozygous A - 1 0 
Heterozygous - 19 Heterozygous - 6 Heterozygous - 4 
Homozygous T- 2 Homozygous T- 6 Homozygous T- 0 

6) A C/T polymorphism is found located at base 31 of SEQ ID NO: 19 and also at base 31 of 
SEQ ID NO:20 (these are the identical locations). The T variant of this polymorphism is actually 
a mutation which was seen in a glioma tumor cell line (DBTRG-05MG) and has not been seen 

1 5 elsewhere other than in the homozygous C form. The T allele of this variant results in the non- 
conservative missense change histidine -> tyrosine in the most likely reading frame of this exon. 
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The number of occurrences of the three combinations of polymorphism in the various test 
groups is: 

Controls TCLs Family Members 

Homozygous C - 1 1 8 Homozygous C - 3 1 Homozygous C - 14 
Heterozygous - 0 Heterozygous - 1 Heterozygous - 0 
Homozygous T- 0 Homozygous T- 0 Homozygous T- 0 

7) An A/T polymorphism is found located at base 173 of SEQ ID NO:25. The number of 
occurrences of the three combinations of polymorphism in the various test groups is: 

Controls TCLs Family Members 

Not tested Homozygous A - 24 Homozygous A - 5 

Not tested Heterozygous - 4 Heterozygous - 9 

Not tested Homozygous T- 4 Homozygous T- 0 

8) An A/G polymorphism is found located at base 183 of SEQ ID NO:25. The number of 
occurrences of the three combinations of polymorphism in the various test groups is: 

Controls TCLs Family Members 

Not tested Homozygous A - 25 Homozygous A - 1 4 

Not tested Heterozygous - 3 Heterozygous - 0 

Not tested Homozygous G- 4 Homozygous G- 0 

9) A C/T polymorphism is found located at base 73 of SEQ ID NO:28. The number of 
occurrences of the three combinations of polymorphism in the various test groups is: 

Controls TCLs Family Members 

Not tested Homozygous C - 22 Homozygous C - 1 1 

Not tested Heterozygous - 8 Heterozygous - 3 

Not tested Homozygous T- 2 Homozygous T- 0 

10) A G/T polymorphism is found located at base 324 of SEQ ID NO:28. The number of 
occurrences of the three combinations of polymorphism in the various test groups is: 



15 
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Controls 
Not tested 
Not tested 
Not tested 



TCLs Family Members 

Homozygous G - 30 Homozygous G - 14 

Heterozygous - 2 Heterozygous - 0 

Homozygous T- 0 Homozygous T- 0 



11) A G/A polymorphism is found located at base 64 of SEQ ID NO:l. The number of 
occurrences of the three combinations of polymorphism in the various test groups is: 
Controls TCLs Family Members 

Homozygous G - 22 Homozygous G - 21 Homozygous G - 6 
Heterozygous - 11 Heterozygous - 3 Heterozygous - 7 
Homozygous A- 0 Homozygous A- 8 Homozygous A- 1 



5 12) A T/A polymorphism is found located at base 207 of SEQ ID NO:2. The A allele of this 
polymorphism occurs at a significantly higher frequency in the TCLs than in the controls and is 
thus in disequilibrium with "the state of being a tumor cell line." The number of occurrences of 
the three combinations of polymorphism in the various test groups is: 
Controls TCLs Family Members 

Homozygous T - 77 Homozygous T - 22 Homozygous T - 1 0 
Heterozygous - 17 Heterozygous - 4 Heterozygous - 4 
Homozygous A - 0 Homozygous A - 5 Homozygous A - 0 

10 13) A G/A polymorphism is found located at base 42 of SEQ ID NO:28. The A variant of this 
polymorphism is actually a germline mutation which was seen elsewhere other than in the 
homozygous G form. The A allele of this variant causes the non-conservative missense change 
glycine -» glutamate in the most likely reading frame of this exon. The number of occurrences 
of the three combinations of polymorphism in the various test groups is: 
Controls TCLs Family Members 

Not tested Homozygous G - 3 1 Homozygous G - 1 4 

Not tested Heterozygous - 0 Heterozygous - 0 

Not tested Homozygous A - 0 Homozygous A - 0 
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EXAMPLE 11 
Analysis of the HPC1 Gene 

The structure and function of HPC1 gene are determined according to the following 
methods. 

5 Biological Studies. Mammalian expression vectors containing HPC1 cDNA are 

constructed and transfected into appropriate prostate carcinoma cells with lesions in the gene. 
Wild-type HPC1 cDNA as well as altered HPC1 cDNA are utilized. The altered HPC1 cDNA 
can be obtained from altered HPC1 alleles or produced as described below. Phenotypic 
reversion in cultures (e.g., cell morphology, doubling time, anchorage-independent growth) and 

10 in animals (e.g., tumorigenicity) is examined. The studies will employ both wild-type and 
mutant forms of the gene. 

Molecular Genetics Studies. In vitro mutagenesis is performed to construct deletion 
mutants and missense mutants (by single base-pair substitutions in individual codons and alanine 
scanning mutagenesis). The mutants are used in biological, biochemical and biophysical studies. 

15 Mechanism Studies. The ability of HPC1 protein to bind to known and unknown DNA 

sequences is examined. Its ability to transactivate promoters is analyzed by transient reporter 
expression systems in mammalian cells. Conventional procedures such as particle-capture and 
yeast two-hybrid system are used to discover and identify any functional partners. The nature 
and functions of the partners are characterized. These partners in turn are targets for drug 

20 discovery. 

Structural Studies. Recombinant proteins are produced in E. coli, yeast, insect and/or 
mammalian cells and are used in crystal lographical and NMR studies. Molecular modeling of 
the proteins is also employed. These studies facilitate structure-driven drug design. 

25 EXAMPLE 12 

Generation of Polyclonal Antibody against HPC1 
Segments of HPC1 coding sequence are expressed as fusion protein in £. coli. The 
overexpressed proteins are purified by gel elution and used to immunize rabbits and mice using a 
procedure similar to the one described by Harlow and Lane, 1988. This procedure has been 
30 shown to generate Abs against various other proteins (for example, see Kraemer, et ai, 1993). 
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Briefly, a stretch of HPC1 coding sequence was cloned as a fusion protein in plasmid 
PET5A (Novagen, Inc., Madison, WI). The HPC1 incorporated sequences might include SEQ 
ID NOs:l, 3, 7, 9, 11, 13, 15, 17, 19, 20, 22, 24, 26, 28, 30 or combinations thereof. After 
induction with IPTG, the overexpression of a fusion protein with the expected molecular weight 

5 is verified by SDS/PAGE. Fusion proteins are purified from the gel by electrocution. The 
identification of the protein as the HPC1 fusion product is verified by protein sequencing at the 
N-terminus. Next, the purified protein is used as immunogen in rabbits. Rabbits are immunized 
with 100 mg of the protein in complete Freund's adjuvant and boosted twice in 3 week intervals, 
first with 100 mg of immunogen in incomplete Freund's adjuvant followed by 100 mg of 

10 immunogen in PBS. Antibody containing serum is collected two weeks thereafter. 

This procedure can be repeated to generate antibodies against mutant forms of the HPC1 
protein. These antibodies, in conjunction with antibodies to wild type HPC1, are used to detect 
the presence and the relative level of the mutant forms in various tissues and biological fluids. 

15 EXAMPLE 13 

Generation of Monoclonal Antibodies Specific for HPC1 
Monoclonal antibodies are generated according to the following protocol. Mice are 
immunized with immunogen comprising intact HPC1 or HPC1 peptides (wild type or mutant) 
.conjugated to keyhole limpet hemocyanin using glutaraldehyde or EDC as is well known. 

20 The immunogen is mixed with an adjuvant. Each mouse receives four injections of 10 to 

100 mg of immunogen and after the fourth injection blood samples are taken from the mice to 
determine if the serum contains antibody to the immunogen. Serum titer is determined by 
ELISA or R1A. Mice with sera indicating the presence of antibody to the immunogen are 
selected for hybridoma production. 

25 Spleens are removed from immune mice and a single cell suspension is prepared (see 

Harlow and Lane, 1988). Cell fusions are performed essentially as described by Kohler and 
Milstein, 1975. Briefly, P3.65.3 myeloma cells (American Type Culture Collection, Rockville, 
MD) are fused with immune spleen cells using polyethylene glycol as described by Harlow and 
Lane, 1988. Cells are plated at a density of 2xl0 5 cells/well in 96 well tissue culture plates. 

30 Individual wells are examined for growth and the supernatants of wells with growth are tested 
for the presence of HPC1 specific antibodies by ELISA or RIA using wild type or mutant HPC1 
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target protein. Cells in positive wells are expanded and subcloned to establish and confirm 
monoclonality. 

Clones with the desired specificities are expanded and grown as ascites in mice or in a 
hollow fiber system to produce sufficient quantities of antibody for characterization and assay 
5 development. 

EXAMPLE 14 
Isolation of HPC1 Binding Peptides 

Peptides that bind to the HPC1 gene product are isolated from both chemical and phage- 
displayed random peptide libraries as follows. 

10 Fragments of the HPC1 gene product are expressed as GST and His-tag fusion proteins 

in both E. coli and SF9 cells. The fusion protein is isolated using either a glutathione matrix (for 
GST fusions proteins) or nickel chelation matrix (for His-tag fusion proteins). This target fusion 
protein preparation is either screened directly as described below, or eluted with glutathione or 
imidizole. The target protein is immobilized to either a surface such as polystyrene; or a resin 

15 such as agarose; or solid supports using either direct absorption, covalent linkage reagents such 
as glutaraldehyde, or linkage agents such as biotin-avidin. 

Two types of random peptide libraries of varying lengths are generated: synthetic peptide 
libraries that may contain derivatized residues, for example by phosphorylation or myristylation, 
and phage-displayed peptide libraries which may be phosphorylated. These libraries are 

20 incubated with immobilized HPC1 gene product in a variety of physiological buffers. Next, 
unbound peptides are removed by repeated washes, and bound peptides recovered by a variety of 
elution reagents such as low or high pH, strong denaturants, glutathione, or imidizole. 
Recovered synthetic peptide mixtures are sent to commercial services for peptide micro- 
sequencing to identify enriched residues. Recovered phage are amplified, rescreened, plaque 

25 purified, and then sequenced to determined the identity of the displayed peptides. 
Use of HPC1 binding nentides 

Peptides identified from the above screens are synthesized in larger quantities as biotin 
conjugates by commercial services. These peptides are used in both solid and solution phase 
competition assays with HPC1 and its interacting partners identified in yeast 2-hybrid screens. 
30 Versions of these peptides that are fused to membrane-permeable motifs (Lin et al., 1995; Rojas 
et al, 1996) will be chemically synthesized, added to cultured cells and the effects on growth, 
apoptosis, differentiation, cofactor response, and internal changes will be assayed. 
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EXAMPLE IS 
Sandwich Assay for HPC1 
Monoclonal antibody is attached to a solid surface such as a plate, tube, bead, or particle. 
Preferably, the antibody is attached to the well surface of a 96- well ELISA plate. 100 ml sample 
5 (e.g., serum, urine, tissue cytosol) containing the HPC1 peptide/protein (wild-type or mutant) is 
added to the solid phase antibody. The sample is incubated for 2 hrs at room temperature. Next 
the sample fluid is decanted, and the solid phase is washed with buffer to remove unbound 
material. 100 ml of a second monoclonal antibody (to a different determinant on the HPC1 
peptide/protein) is added to the solid phase. This antibody is labeled with a detector molecule 
10 (e.g., 125-1, enzyme, fluorophore, or a chromophore) and the solid phase with the second 
antibody is incubated for two hrs at room temperature. The second antibody is decanted and the 
solid phase is washed with buffer to remove unbound material. 

The amount of bound label, which is proportional to the amount of HPC1 peptide/protein 
present in the sample, is quantitated. Separate assays are performed using monoclonal 
5 antibodies which are specific for the wild-type HPC1 as well as monoclonal antibodies specific 
for each of the mutations identified in HPC1 . 

EXAMPLE 16 

Two-hybrid Assay to Identify 
20 Proteins that Interact with HPC1 

Sequence encoding all or portions of HPC1 are ligated to pAS2-l (Clontech) such that 

the coding sequence of HPC1 is in-frame with coding sequence for the GAL4p DNA-binding 

domain. This plasmid construct is introduced into the yeast reporter strain Y190 by 

transformation. A library of activation domain fusion plasmids prepared from human prostate 

25 cDNA (Clontech) is then introduced into strain Y190 carrying the pAS2-l-based fusion 
construct. Transformants are spread onto 20 - 150 mm plates of yeast minimal media lacking 
leucine, tryptophan, and histidine, and containing 25 mM 3-amino-l,2,4-triazole. After one 
week incubation at 30°C, yeast colonies are assayed for expression of the lacZ reporter gene by 
p-galactosidase filter assay. Colonies that both grow in the absence of histidine and are positive 

30 for production of P-galactosidase are chosen for further characterization. 

The activation domain plasmid is purified from positive colonies by the smash-and-grab 
technique. These plasmids are introduced into £. coll DH5a by electroporation and purified 
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from E. coli by the alkaline lysis method. To test for the specificity of the interaction, specific 
activation domain plasmids are cotransformed into strain Y190 with plasmids encoding various 
DNA-binding domain fusion proteins, including fusions to HPC1 and human lamin C. 
Transformants from these experiments are assayed for expression of the HI S3 and lacZ reporter 

5 genes. Positives that express reporter genes with HPC1 constructs and not with lamin C 
constructs encode bona fide HPC1 interacting proteins. These proteins are identified and 
characterized by sequence analysis of the insert of the appropriate activation domain plasmid. 

This procedure is repeated with mutant forms of the HPC1 gene, to identify proteins that 
interact with only the mutant protein or to determine whether a mutant form of the HPC1 protein 

10 can or cannot interact with a protein known to interact with wild-type HPC 1 . 
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WHAT IS CLAIMED IS : 

1. An isolated nucleic acid selected from the group consisting of a nucleic acid having SEQ 
ID NO:l, a nucleic acid having SEQ ID NO:3, a nucleic acid having SEQ ID NO:5, a 
nucleic acid having SEQ ID NO:7, a nucleic acid having SEQ ID NO:8, a nucleic acid 

5 having SEQ ID NO: 10, a nucleic acid having SEQ ID NO: 12, a nucleic acid having SEQ 

ID NO: 14, a nucleic acid having SEQ ID NO: 16, a nucleic acid having SEQ ID NO: 18, a 
nucleic acid having SEQ ID NO:20, a nucleic acid having SEQ ID NO:22, a nucleic acid 
having SEQ ID NO:24, a nucleic acid having SEQ ID NO:26, a nucleic acid having SEQ 
ID NO:28, a nucleic acid having SEQ ID NO:30, a nucleic acid having SEQ ID NO:32, a 

10 nucleic acid having SEQ ID NO:34, a nucleic acid having SEQ ID NO:36, a nucleic acid 

having SEQ ID NO:38, a nucleic acid having SEQ ID NO:39, a nucleic acid having SEQ 
ID NO:41, a nucleic acid having SEQ ID NO:43, a nucleic acid having SEQ ID NO:45, a 
nucleic acid having SEQ ID NO:47, a nucleic acid having SEQ ID NO:49, and a nucleic 
acid having SEQ ID NO:5 1 . 

15 

2. An isolated nucleic acid selected from the group consisting of a nucleic acid having SEQ 
ID NO:2, a nucleic acid having SEQ ID NO:4, a nucleic acid having SEQ ID NO:6, a 
nucleic acid having SEQ ID NO:9, a nucleic acid having SEQ ID NO: 1 1, a nucleic acid 
having SEQ ID NO: 13, a nucleic acid having SEQ ID NO: 15, a nucleic acid having SEQ 

20 ID NO:17, a nucleic acid having SEQ ID NO:19, a nucleic acid having SEQ ID NO:21, a 

nucleic acid having SEQ ID NO:23, a nucleic acid having SEQ ID NO:25, a nucleic acid 
having SEQ ID NO:27, a nucleic acid having SEQ ID NO:29, a nucleic acid having SEQ 
ID NO:31, a nucleic acid having SEQ ID NO:33, a nucleic acid having SEQ ID NO:35, a 
nucleic acid having SEQ ID NO:37, a nucleic acid having SEQ ID NO:40, a nucleic acid 

25 having SEQ ID NO:42, a nucleic acid having SEQ ID NO:44, v a nucleic acid having SEQ 

ID NO:46, a nucleic acid having SEQ ID NO:48, and a nucleic acid having SEQ ID NO:52. 

3. An isolated DNA coding for a wild-type HPC1 polypeptide or a portion of an HPC1 
polypeptide wherein said portion is encoded by any one or more of the nucleic acids 

30 selected from the group consisting of SEQ ID NOs:l, 3, 5, 7, 8, 10, 12, 14, 16, 18, 20, 22, 

24, 26, 28, 30, 32, 34, 36, 38, 39, 41 , 43, 45, 47, 49 and 51 . 



BNSDOCID: <WO 001 2694 A1_L> 



10 



WO 00/12694 PCT/US99/19508 

103 

4. An isolated DN A encoding a variant of a wild-type HPC 1 polypeptide or a portion of an 
HPC1 polypeptide wherein said DNA is encoded by any one or more allelic variants of 
the nucleic acids selected from the group consisting of SEQ ID NOs: 1, 3, 5, 7, 8, 10, 12, 
14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 39, 41, 43, 45, 47, 49 and 51. 

5. The isolated DNA of claim 4 wherein said allelic variant is selected from the group 
consisting of the variants set forth in Tables 1 1 and 13. 

6. The isolated DNA of any one of claims 3 to 5 which contains HPC1 regulatory sequences. 

7. An isolated DNA having at least 15 consecutive nucleotides of the isolated DNA of any 
one of claims 1 to 5. v 

8. An isolated DNA coding for a mutated form of HPC 1 polypeptide. 

9 The isolated DNA of claim 8, wherein the DNA comprises a mutated form of any one or 
more of the nucleotide sequences set forth in SEQ ID NOs: 1 -52. 

10. The isolated DNA of claim 8, wherein said mutated form is selected from the group 
20 consisting of the mutations set forth in Table 10. 

11. The isolated DNA of claim 9, wherein the mutation is selected from the group consisting of 
a deletion mutation, a nonsense mutation, an insertion mutation and a missense mutation. 

25 12. An isolated DNA having at least 15 consecutive nucleotides of the DNA of any one of 
claims 8-11. 

13. The isolated DNA of claim 12, wherein the isolated DNA overlaps the mutation. 

30 14. An isolated DNA selected from the group consisting of: 

(1) a DNA having the nucleotide sequence set forth in SEQ ID NO:47 having an A at 
nucleotide position 42, 



15 
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(2) a DNA having the nucleotide sequence set forth in SEQ ID NO:38 having a T at 
nucleotide position 31, 

(3) a DNA having the nucleotide sequence set forth in SEQ ID NO:39 having a T at 
nucleotide position 31, 

5 (4) a DNA having the nucleotide sequence set forth in SEQ ID NO:4 having an A at 

nucleotide position 207, and 

(5) a DNA having the nucleotide sequence set forth in SEQ ID NO:20 having a T at base 
158. 

10 15. A replicative cloning vector which comprises the isolated DNA of any one of claims 1 to 5 
and 8 to 10 or parts thereof and a replicon operative in a host cell. 

16. An expression system which comprises the isolated DNA of any one of claims 1 to 5 and 8 
to 10 operably linked to suitable control sequences. 

15 

1 7. Recombinant host cells transformed with the expression system of claim 15. 

18. A method of producing recombinant HPC 1 polypeptide which comprises culturing the cells 
of claim 17 under conditions effective for the production of said HPC1 polypeptide and 

20 harvesting the recombinant HPC 1 polypeptide. 

1 9. A preparation of human HPC1 polypeptide substantially free of other human proteins. 

20. A preparation of human polypeptide substantially free of other human proteins, the amino 
25 acid sequence of said polypeptide having substantial sequence homology with a wild-type 

HPC1 polypeptide, and said human polypeptide having substantially similar function as a 
wild-type HPC 1 polypeptide. 

21 . An antibody immunoreactive with a human HPC1 polypeptide or portion thereof. 

30 

22. The antibody of claim 21 which is a polyclonal antibody. 



■* 
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23 . The antibody of claim 2 1 which is a monoclonal antibody. 
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24. A pair of single-stranded DNA primers for determination of a nucleotide sequence of an 
HPC1 gene by an amplification reaction, the sequence of said primers being derived from 

5 human chromosome 1, wherein the use of said primers in an amplification results in the 

syntheses of DNA having all or part of the sequence of the HPC 1 gene. 

25. The pair of primers of claim 24 wherein said HPC1 gene has a nucleotide sequence set 
forth in any one or more of SEQ ID NOs: 1 -52. 

10 

26. A method for identifying a mutant HPC 1 nucleotide sequence in a suspected mutant HPC 1 
allele which comprises comparing the nucleotide sequence of the suspected mutant HPC1 
allele with a wild-type HPC1 nucleotide sequence, wherein a difference between the 
suspected mutant and a wild-type sequences identifies a mutant HPC1 nucleotide sequence. 

15 

27. A method for identifying a polymorphic HPC1 nucleotide sequence in a suspected 
polymorphic HPC1 allele which comprises comparing the nucleotide sequence of the 
suspected polymorphic HPC1 allele with a wild-type HPC1 nucleotide sequence of the said 
HPC1 gene having a nucleotide sequence set forth in any one or more of SEQ ID NOs:l- 

20 31, wherein a difference between the suspected polymorphic and wild-type sequences 

identifies a polymorphic HPC1 nucleotide sequence. 

28. A method for identifying a consensus HPC1 nucleotide sequence which comprises 
analyzing the sequence of at least 5 individuals having a wild-type allele and identifying 

25 any polymorphisms and their frequency in the population examined. 

29. A kit for detecting mutations in the HPC1 gene resulting in a susceptibility to prostate 
cancer which comprises at least one oligonucleotide primer specific for an HPC1 gene 
mutation and instructions relating to detecting mutations in the HPC1 gene. 

30 
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30. A kit for detecting mutations in the HPC1 gene resulting in a susceptibility to prostate 
cancer which comprises at least one allele-specific oligonucleotide probe for an HPC1 gene 
mutation and instructions relating to detecting mutations in the HPC1 gene. 

5 31. A method for supplying a wild-type HPC 1 gene function or an HPC 1 function substantially 
similar to the wild-type to a cell which has lost said gene function or has altered gene 
function by virtue of a mutation in a HPC1 gene, wherein said method comprises: 
introducing into the cell a nucleic acid which suppresses a transformed state of said cell, 
said nucleic acid selected from the group consisting of a wild-type HPC1 gene nucleic acid, 

10 a portion of said wild-type HPC1 gene nucleic acid, a nucleic acid substantially 

homologous and has substantially similar function to said wild-type HPC1 gene nucleic 
acid and a portion of said nucleic acid substantially homologous to said wild-type HPC1 
gene nucleic acid. 



1 5 32. The method of claim 3 1 wherein said nucleic acid is a wild-type HPC1 gene nucleic acid. 

33. The method of claim 31 or 32. wherein said nucleic acid contains an HPC1 gene regulatory 
sequences. 

20 34. The method of any one of claims 3 1 to 33 wherein said nucleic acid is incorporated into the 
genome of said cell. 

35. A method for supplying a wild-type HPC1 gene function or an HPC1 function substantially 
similar to the wild-type to a cell which has lost said gene function or has altered gene 

25 function by virtue of a mutation in the HPC1 gene, comprising: introducing into the cell a 

molecule which suppresses a transformed state of said cell, said molecule selected from the 
group consisting of a wild-type HPC1 polypeptide, a portion of said wild-type HPC1 
polypeptide, a polypeptide substantially homologous to said wild-type HPC1 polypeptide, 
a portion of said polypeptide substantially homologous to said wild-type HPC1 polypeptide 

30 and a molecule which mimics the function of said wild-type HPC 1 polypeptide. 

36. The method of claim 35 wherein said molecule is a wild-type HPC1 polypeptide. 
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37. A method for screening potential cancer therapeutics which comprises: combining (i) an 
HPC1 binding partner, (ii) an HPC1 polypeptide having a portion of said amino acid 
sequence which binds to said binding partner and (iii) a .compound suspected of being a 
cancer therapeutic and determining the amount of binding of the HPC1 polypeptide to its 
binding partner. 

38. A method for screening potential cancer therapeutics which comprises: combining an 
HPC1 binding partner and a compound suspected of being a cancer therapeutic and 
measuring the biological activity of the binding partner. 

39. . A method for screening potential cancer therapeutics, wherein said method comprises: 

growing a transformed eukaryotic host cell containing an altered HPC1 gene causing 
cancer in the presence of a compound suspected of being a cancer therapeutic, growing said 
transformed eukaryotic host cell in the absence of said compound, determining the rate of 
growth of said host cell in the presence of said compound and the rate of growth of said 
host cell in the absence of said compound and comparing the growth rate of said host cells, 
wherein a slower rate of growth of said host cell in the presence of said compound is 
indicative of a cancer therapeutic. 

40. A method for screening potential cancer therapeutics which comprises: administering a 
compound suspected of being a cancer therapeutic to a transgenic animal which carries an 
altered HPC1 allele from a second animal in its genome and determining the development 
or growth of a cancer lesion. 

41 . A transgenic animal which carries an altered HPC1 allele. 

42. The transgenic animal of claim 41 wherein the altered HPC1 allele is selected from the 
group consisting of a deletion mutation, a nonsense mutation, a frameshift mutation and a 
missense mutation. 

43. The transgenic animal of claim 41 wherein the altered HPC1 allele is a disrupted allele. 
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44. A method for screening germline of a human subject for an alteration of an HPC1 gene, 
wherein said method comprises comparing germline sequence of an HPC1 gene or HPC1 
RNA from a tissue sample from said subject or a sequence of HPC1 cDNA made from 

5 mKNA from said sample or an HPC1 polypeptide isolated from said sample with germline 

sequences of wild-type HPC1 gene, wild-type HPC1 RNA, wild-type HPC1 cDNA or a 
wild-type HPC1 polypeptide wherein a difference in the sequence of the HPC1 gene, HPC1 
RNA, HPC1 cDNA or HPC1 polypeptide of the subject from wild-type indicates an 
alteration in the HPC1 gene in said subject. 

0 

45. The method of claim 44 wherein said wild-type HPC1 gene has a nucleotide sequence set 
forth in any one or more of SEQ ID NOs:l-52. 

46. The method of claim 44 wherein the nucleic acid sequence of HPC1 RNA from the subject 
15 is compared to nucleic acid sequences of wild-type HPC1 gene/HPCl RNA or HPC1 

cDNA. 

47. The method of claim 44 wherein the nucleic acid sequence is compared by hybridizing an 
HPC1 gene probe which specifically hybridizes to an HPC1 allele to RNA isolated from 

20 said subject and detecting of the presence of a hybridization product, wherein a presence of 

said product indicates the presence of said allele in the subject. 

48. The method of claim 44 wherein the HPC1 polypeptide from the subject is compared to a 
wild-type HPC1 polypeptide. 

25 

49. The method of claim 44 wherein a regulatory region of the HPC 1 gene from said subject is 
compared with a regulatory region of wild-type HPC 1 gene sequences. 

50. The method of claim 44 wherein a germline nucleic acid sequence is compared by 
30 obtaining a first HPC1 gene fragment from an HPC1 gene from a human sample and a 

second HPC1 gene fragment from a wild-type HPC1 gene, said second fragment 
corresponding to said first fragment, forming single-stranded DNA from said first HPC1 
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gene fragment and from said second HPC1 gene fragment, electrophoresing said single- 
stranded DNAs on a non-denaturing polyacrylamide gel, comparing the mobility of said 
single-stranded DNAs on said gel to determine if said single-stranded DNA from said first 
HPC1 gene fragment is shifted relative to said second HPC1 gene fragment and sequencing 
5 said single-stranded DNA from said first HPC1 gene fragment having a shift in 

electrophoretic mobility. 

51. The method of claim 44 wherein a germline nucleic acid sequence is compared by 
hybridizing an HPC1 gene probe which specifically hybridizes to an HPC1 allele to 

10 genomic DNA isolated from said sample and detecting the presence of a hybridization 

product, wherein a presence of said product indicates the presence of said allele in the 
subject. 

52. The method of claim 44 wherein the alteration in the germline sequence is detected by 
15 hybridization with an allele-specific probe. 

53. The method of claim 44 wherein a germline nucleic acid sequence is compared by 
amplifying all or part of an HPC1 gene from said sample using a set of primers to produce 
amplified nucleic acids and sequencing said amplified nucleic acids. 

20 

54. The method of claim 44 wherein a germline nucleic acid sequence is compared by 
amplifying all or part of an HPCl gene using a primer specific for a specific HPC1 mutant 
allele and detecting the presence of an amplified product, wherein a presence of said 
product indicates the presence of said specific allele. 

55. The method of claim 44 wherein a germline nucleic acid sequence is compared by 
molecularly cloning all or part of said HPC1 gene from said sample to produce cloned 
nucleic acid and sequencing the cloned nucleic acid. 

30 56. The method of claim 44 wherein a germline nucleic acid sequence is compared by 
obtaining a first HPCl gene fragment from (a) HPC1 gene genomic DNA isolated from 
said sample, (b) HPCl RNA isolated from said sample or (c) HPCl cDNA made from 



25 



BNSDOCID: <WO 001 2694 A1_l_> 



WO 00/12694 PCT/US99/19508 

110 

mRNA isolated from said sample, obtaining a second HPC1 gene fragment from (a) wild- 
type HPC1 genomic DNA, (b) wild-type HPC1 RNA or (c) wild-type cDNA made from 
wild-type mRNA, said second HPC1 gene fragment corresponding to said first HPC1 gene 
fragment, forming single-stranded DNA from said first HPC1 gene fragment and from said 

5 second HPC1 gene fragment, forming a heteroduplex consisting of single-stranded DNA 

from said first HPC1 gene fragment and single-stranded DNA from said second HPC1 
gene fragment, analyzing the heteroduplex to determine if said single-stranded DNA from 
said first HPC1 gene fragment has a mismatch relative to said single-stranded DNA from 
said second HPC1 gene fragment and sequencing said first single-stranded DNA from said 

1 0 first HPC 1 gene fragment having a mismatch. 

57. The method of claim 44 wherein a germline nucleic acid sequence is compared by 
amplifying HPC1 nucleic acids from said sample to produce amplified' nucleic acids, 
hybridizing the amplified nucleic acids to an HPC1 DNA probe specific for an HPC1 allele 

1 5 and detecting the presence of a hybridization product, wherein the presence of said product 

indicates the presence of said allele in the subject. 

58. The method of claim 44 wherein the alteration in the germline sequence is detected by 
amplification of HPC1 gene sequences in said tissue and hybridization of the amplified 

20 sequences to nucleic acid probes which comprise mutant HPC1 gene sequences. 

59. The method of claim 44 wherein a germline nucleic acid sequence is compared by 
analyzing HPC1 nucleic acids in said sample for a mutation from the group consisting of a 
deletion mutation, a point mutation and an insertion mutation. 

25 

60. The method of claim 44 wherein a germline nucleic acid sequence is compared by 
hybridizing the tissue sample in silu with a nucleic acid probe specific for an HPC1 allele 
and detecting the presence of a hybridization product, wherein a presence of said product 
indicates the presence of said allele in the subject. 

30 

61. The method of claim 48 wherein the alteration in the germline sequence is detected by 
immunoblotting. 



BNSOOCID: <WO 0012694A1J_> 



WO 00/12694 



111 



PCT/US99/19508 



62. The method of claim 48 wherein the alteration in the germline sequence is detected by 
immunocytochemistry. 

5 63. The method of claim 48 wherein the alteration in the germline sequence is detected by 
assaying for binding interactions between HPC1 gene protein isolated from said tissue and 
its ligand. 

64. The method of claim 48 wherein the alteration in the germline sequence is detected by 
1 0 assaying for the inhibition of biochemical activity of its binding partner. 

65. The method of claim 48, wherein an HPC1 polypeptide from a tissue sample from said 
subject is analyzed for an altered HPC1 polypeptide by (i) detecting either a full length 
HPC1 polypeptide or a truncated HPC1 polypeptide or (ii) contacting an antibody which 

1 5 specifically binds to an epitope of an altered HPC1 polypeptide to the HPC1 polypeptide 

from said sample and detecting bound antibody, wherein the presence of a truncated 
protein or bound antibody indicates the presence of a germline alteration in the HPC1 
gene. 

20 66. The method of claim 65 wherein an HPC1 polypeptide is analyzed by detecting a 
truncated HPC1 polypeptide. 

67. The method of claim 65 wherein an HPC1 polypeptide is analyzed by contacting an 
antibody which specifically binds to an epitope of an altered HPC1 polypeptide to the 

25 HPC1 polypeptide from said sample. 

68. A kit for screening for an alteration in an HPC1 gene in a human subject which comprises 
at least one antibody (i) which specifically binds to wild-type HPC1 polypeptide but not a 
truncated HPC1 polypeptide or (ii) which specifically binds to an epitope of an altered 

30 HPC1 polypeptide. 
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69. A method for screening a tumor sample from a human subject for the presence of a 
somatic alteration in an HPC1 gene in said tumor which comprises comparing a sequence 
of an HPC1 gene from said tumor sample, or HPC1 RNA from said tumor sample, or a 
sequence of HPC1 cDNA made from mRNA from said sample or an HPC1 polypeptide 

5 isolated from said sample with sequences of wild-type HPC1 gene, wild-type HPC1 RNA, 

wild-type HPC1 cDNA or a wild-type HPC1 polypeptide from a nontumor sample from 
said subject, wherein a difference in the sequence of the HPC1 gene, HPC1 RNA, HPC1 
cDNA or HPC1 polypeptide of the tumor sample from the nontumor sample indicates an 
alteration in the HPC1 gene in said tumor sample. 

10 

70. The method of claim 69 wherein said wild-type HPC1 gene has a nucleotide sequence set 
forth in any one or more of SEQ ID NOs: 1-31. 

71. The method of claim 69 wherein the nucleic acid sequence of HPC 1 RNA from the subject 
15 is compared to nucleic acid sequences of wild-type HPC1 gene, HPC1 RNA or HPC1 

cDNA. 

72. The method of claim 69 wherein the nucleic acid sequence is compared by hybridizing an 
HPC1 gene probe which specifically hybridizes to an HPC1 allele to RNA isolated from 

20 said tumor sample and detecting of the presence of a hybridization product, wherein a 

presence of said product indicates the presence of said allele in thie tumor sample. 

73. The method of claim 69 wherein the HPC1 polypeptide from the tumor sample is 
compared to the HPC1 polypeptide from the nontumor sample. 

25 

74. The method of claim 69 wherein a regulatory region of the HPC1 gene from said tumor 
sample is compared with a regulatory region of the HPC1 gene from the nontumor sample. 

75. The method of claim 69 wherein the nucleic acid sequence is compared by obtaining a first 
30 HPC1 gene fragment from an HPC1 gene from the tumor sample and a second HPC1 gene 

fragment from the nontumor sample, said second fragment corresponding to said first 
fragment, forming single-stranded DNA from said first HPC1 gene fragment and from said 
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second HPC1 gene fragment, electrophoresing said single-stranded DNAs on a non- 
denaturing polyacrylamide gel, comparing the mobility of said single-stranded DNAs on 
said gel to determine if said single-stranded DMA from said first HPC1 gene fragment is 
shifted relative to said second HPC1 gene fragment and sequencing said single-stranded 
DNA from said first HPC1 gene fragment having a shift in electrpphoretic mobility. 

The method of claim 69 wherein a nucleic acid sequence is compared by hybridizing an 
HPC1 gene probe which specifically hybridizes to an HPC1 allele to genomic DNA 
isolated from said tumor sample and detecting the presence of a hybridization product, 
wherein a presence of said product indicates the presence of said allele in the tumor. 

The method of claim 69 wherein the alteration in the nucleic acid sequence is detected by 
hybridization with an allele-specific probe. 

The method of claim 69 wherein a nucleic acid sequence is compared by amplifying all or 
part of an HPC1 gene from said tumor sample using a set of primers to produce amplified 
nucleic acids and sequencing said amplified nucleic acids. 

The method of claim 69 wherein a nucleic acid sequence is compared by amplifying all or 
part of an HPC1 gene in the tumor sample using a primer specific for a specific HPC1 
mutant allele and detecting the presence of an amplified product, wherein a presence of said 
product indicates the presence of said specific allele. 

The method of claim 69 wherein a nucleic acid sequence is compared by molecularly 
cloning all or part of said HPC1 gene from said tumor sample to produce cloned nucleic 
acid and sequencing the cloned nucleic acid. 

The method of claim 69 wherein a nucleic acid sequence is compared by obtaining a first 
HPC1 gene fragment from (a) HPC1 gene genomic DNA isolated from said tumor sample, 
(b) HPC1 RNA isolated from said tumor sample or (c) HPC1 cDNA made from mRNA 
isolated from said tumor sample, obtaining a second HPC1 gene fragment from (a) HPC1 
genomic DNA from said nontumor sample, (b) HPC1 RNA from said nontumor sample or 
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(c) cDNA made from said mRNA from said nontumor sample, said second HPC1 gene 
fragment corresponding to said first HPC1 gene fragment, forming single-stranded DNA 
from said first HPC1 gene fragment and from said second HPC1 gene fragment, forming a 
heteroduplex consisting of single-stranded DNA from said first HPC1 gene fragment and 
single-stranded DNA from said second HPC1 gene fragment, analyzing the heteroduplex to 
determine if said single-stranded DNA from said first HPC1 gene fragment has a mismatch 
relative to said single-stranded DNA from .said second HPC1 gene fragment and 
sequencing said first single-stranded DNA from said first HPC1 gene fragment having a 
mismatch. 



10 



82. The method of claim 69 wherein a nucleic acid sequence is compared by amplifying HPC1 
nucleic acids from said tumor sample to produce amplified nucleic acids, hybridizing the 
amplified nucleic acids to an HPC1 DNA probe specific for an HPC1 allele and detecting 
the presence of a hybridization product, wherein the presence of said product indicates the 

1 5 presence of said allele in the tumor sample. 

83. The method of claim 69 wherein the alteration in the sequence is detected by amplification 
of HPC1 gene sequences in said tumor sample and hybridization of the amplified 
sequences to nucleic acid probes which comprise mutant HPC1 gene sequences. 

20 

84. The method of claim 69 wherein a nucleic acid sequence is compared by analyzing HPC1 
nucleic acids in said sample for a mutation from the group consisting of, a deletion 
mutation, a point mutation and an insertion mutation. 

25 85. The method of claim 69 wherein a nucleic acid sequence is compared by hybridizing the 
tissue sample in situ with a nucleic acid probe specific for an HPC1 allele and detecting the 
presence of a hybridization product, wherein a presence of said product indicates the 
presence of said allele in the tumor sample. 

30 86. The method of claim 73 wherein the alteration in the sequence is detected by 
immunoblotting. 
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87. The method of claim 73 wherein the alteration in the sequence is detected by 
immunocytochemistry. 

88. The method of claim 73 wherein the alteration in the sequence is detected by assaying for 
5 binding interactions between HPC1 gene protein isolated from said tumor sample and its 

ligand. 

89. The method of claim 73 wherein the alteration in the sequence is detected by assaying for 
the inhibition of biochemical activity of its binding partner. 

10 

90. The method of claim 73, wherein an HPC1 polypeptide from a tumor sample from said 
subject is analyzed for an altered HPC1 polypeptide by (i) detecting either a full length 
HPC1 polypeptide or a truncated HPC1 polypeptide or (ii) contacting an antibody which 
specifically binds to an epitope of an altered HPC1 polypeptide to the HPC1 polypeptide 

15 from said sample and detecting bound antibody, wherein the presence of a truncated 

protein or bound antibody indicates the presence of an alteration in the HPC1 gene. 

91. The method of claim 90 wherein an HPC1 polypeptide is analyzed by detecting a 
truncated HPC1 polypeptide. 

20 

92. The method of claim 90 wherein an HPC1 polypeptide is analyzed by contacting an 
antibody which specifically binds to an epitope of an altered HPC1 polypeptide to the 
HPC1 polypeptide from said sample. 
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SEQUENCE LISTING 

<110> Myriad Genetics, Inc. 

<120> 

<130> 

<140> 
<141> 

<160> 

<170> 

(2) INFORMATION FOR SEQ ID NO:l: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 74 base pairs 
20 (B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 
25 (A) ORGANISM: Homo sapiens 



(viii) NAME: exon.glm20 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



JCTCAAAAAACATTTGTTGAGTAAGTGAACCTGAGACTATCAACAAGC 
i ts m t <r t a a n a Tra PT AG r AAAG 



1 GTGL A K*r%rv~»jn>rvri\*s-* * * * w * * — - — - • 

51 ATT ATTTTAAAATCACTAGC AAAG 



35 (2) INFORMATION FOR SEQ ID NO: 2: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 224 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: genomic DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME :glm20. genomic 

(ix) FEATURE: 

(A) NAME/KEY: EXON: 141-214 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

1 tggtattttt ggttgattta ttaatttctt ttctgtcttt ccactctaat 

51 cilaaacata aatgtcccga agcagaaacc ctatacaact ^cttcagtac 

*5 mi ttttcctacc aatqcttaaa aggatgcctt tttattgtag GTGCTCAAAA 

III J^caSgtt GAGTMGTGA ACCTGAGACT atcaacaagc attattttaa 

201 AATCACTAGC AAAGgtaagt aagt 

60 (2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 161 base pairs 

(B) TYPE: nucleic acid 
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(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: exon.g2ml3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

1 AAAAAAAGCC ATTTTCCTCT CTCCACCACA TACCACACTT CTGGGATTCT 

51 GAAATATCTT ACCGAGCACT TTATTCAATC TAAATTTAAA TAGAAGTTTT 

101 CCCCACTTCC CAAGAGAGAA ACAACAACGA ACTAGATGAG AATGAGAGGA 

151 ACTGGAAGAA G 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 253 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: genomic DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: g2ml3 . genomic 

(ix) FEATURE: 

(A) NAME/KEY : exon : 4 0-200 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

1 ctcacaggcc aacgtattca aaagctccag gaaaaaaaaA AAAAAAGCCA 

51 TTTTCCTCTC TCCACCACAT ACCACACTTC TGGGATTCTG AAATATCTTA 

101 CCGAGCACTT TATTCAATCT AAATTTAAAT AGAAGTTTTC CCCACTTCCC 

151 AAGAGAGAAA CAACAACGAA CTAGATGAGA ATGAGAGGAA CTGGAAGAAG 

201 gtaatgctcc atatcatttg gttaatctat tcttgtttat taat-ttatta 

251 cat 



(2) INFORMATION FOR SEQ ID NO:5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 230 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: exon.g3ml 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 

1 TAATGAAATC TGAGAAGCTG AATTTAGCAA TACAGATGCA AACTGTGCCA 
51 TCAGAAGATT AAAATGAAAG TGAAATGTCC TGAAAATATC AGAATCGCAT 
101 CAGTAATAGA AGTAAATGAA AAGTGAAGAC CTCTTTGAAT TATCTTATTT 
151 CATTTGACTA TGTTCCTCCT GAGTCACAAA AAAAGGATGT TACAGCTATT 
201 TTTCTTAAGC TGATGGGCCA AAAGATTGTA 



(2) INFORMATION FOR SEQ ID NO: 6: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 310 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: genomic DNA 

(vi) ORIGINAL SOURCE : 

(A) ORGANISM: Homo sapiens 

(viii) NAME: g3ml. genomic 

(ix) FEATURE: 

(A) NAME/KEY :EXON: 51-280 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

1 taattcatat gaagaaaagt tttttgtatt tgttttttgg taaaatgcag 

51 TAATGAAATC TGAGAAGCTG AATTTAGCAA TACAGATGCA AACTGTGCCA 

101 TCAGAAGATT AAAATGAAAG TGAAATGTCC TGAAAATATC AGAATCGCAT 

151 CAGTAATAGA AGTAAATGAA AAGTGAAGAC CTCTTTGAAT TATCTTATTT 

201 CATTTGACTA TGTTCCTCCT GAGTCACAAA AAAAGGATGT TACAGCTATT 

251 TTTCTTAAGC TGATGGGCCA AAAGATTGTA gtaagtattt tgacgattgt 

301 ctggtggggg 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 977 base pairs . 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: exon.g4ml7a 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

1 TTCTAGGCTC CAGGGACACA AGAGTGACCA AAACTGACAG AACAGGTCAT 

51 CATGGAGCTT TCAATGTGGC TGAGGATACC TGAGGAAAAA AACATCTGCT 

101 GTCAACCTAA ACTAAACTTT GCTATCAACC ACAATAGGAT AACACAGGGT 

151 GCTGAGAGAA TAACACAGAG GTGCTTGAGT AACATGAGCT CCTAACTGAC 

201 CACCACTAAG TATGTGGTTC TACCAAATGA ATAAAATCAC TGGCCATCCT 

251 ATTAGTTTTG TCTTTGCTTA GCCTAATGAT AGGAAACGAA TGGAACTTTG 

301 CCATCAAACG AATACAAGAT AATCCCTATT GCTTGAATTC TTAATTGTTT 

351 TTTGGACTAC CGATATATTT TACTTCATTT GGTACCGTAG TTTACATATT 

401 TTAAATAAAG TATTATACAT AATATCTTAC ATAAATATGA ATGATTTCAA 

4 51 GGTGCTACAT CCCTGGAATT GTTATATTTT AATCTGTTGA TGTTGCTAAA 

501 ACCCACAGTG GTGGCAATGT TACTAAAATT GAATTCATTC TTTCACTTGA 

551 AGGTAATGCA TAAAGAGGTA AGGCCAAAAT ACTGAACGAA ACTTAATATA 

601 GTAATTAAAG TGTGCTGTAA ATTTTGCACT TCTTATTAAA GTTTAGTTTC 

651 AATTAATTAG CATCTTGGAA ATAAAAAGGA TAATTTTTAA AGTATTCTAA 

701 TTTTCAATAA ATAAAAAGAA AATATTACTA AGATTCCTAG GATATATTGA 

751 TCAATACTAT CCATTAATGT AACTGAAAGT GGTTAGAAAA TTTCAGGACC 

801 AACTACTGTA AACTAAAATC AGAGCTTTAG TTCATCCATA GCCAAAAATA 

851 TTTATCTCTG TGTTCTGCAG AGCATAAAGC TAGGGAAGTA AATAAGCATT 

901 GTGTGAAATT TGTGATTAAA ATGAACTTCT ATTTAAATAA AAAAGAAAAA 

951 GCACTTGATA AGaaaaaaaa aaaaaaa 



(2) INFORMATION FOR SEQ ID NO: 8: 



WO 00/12694 



PCT/US99/19508 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 170 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: cDNA 



10 



20 



45 



50 



55 



60 



65 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: exon.g4ml7b 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8; 



15 1 TTCTAGGCTC CAGGGACACA AGAGTGACCA AAACTGACAG AACAGGTCAT 

51 CATGGAGCTT TCAATGTGGC TGAGGATACC TGAGGAAAAA AACATCTGCT 

101 GTCAACCTAA ACTAAACTTT GCTATCAACC ACAATAGGAT AACACAGGGT 

151 GCTGAGAGAA TAACACAGAG 



(2) INFORMATION FOR SEQ ID NO: 9: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1090 base pairs 
25 (B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: genomic DNA 

(vi) ORIGINAL SOURCE: 
30 (A) ORGANISM: Homo sapiens 

(viii) NAME: g4ml7 . genomic 

(ix) FEATURE: 

35 (A) NAME/KEY: ALTERNATE EXON (a): 57-1018 

(A) NAME/KEY : ALTERNATE EXON (b) : 57-226 



40 



(xi) SEQUENCE 


1 


atttattcag 


51 


atgcagTTCT 


101 


GGTCATCATG 


151 


TCTGCTGTCA 


201 


CAGGGTGCTG 


251 


ACTGACCACC 


301 


CATCCTATTA 


351 


ACTTTGCCAT 


401 


TTGTTTTTTG 


451 


CATATTTTAA 


501 


TTTCAAGGTG 


551 


GCTAAAACCC 


601 


ACTTGAAGGT 


651 


AATATAGTAA 


701 


AGTTTCAATT 


751 


TTCTAATTTT 


801 


TATTGATCAA 


851 


AGGACCAACT 


901 


AAAATATTTA 


951 


AGCATTGTGT 


1001 


GAAAAAGCAC 


1051 


tatctatatt 



(2) INFORMATION FOR SEQ ID NO:10: 



BNSDOCID: <WO 0012694A1_I_> 



WO 00/12694 



PCT/US99/19508 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 220 base pairs 

(B) TYPE: nucleic acid 

5 (ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

10 (viii) NAME: exon.g5ml9 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

1 AAAAATAACC TGATACAAGC AGAATTGTCA AGGAATGTCA TATGTCAAAA 

15 51 AATGAAGTGA CAAATGAGAA AGGGCTGAGT CACTTAGAAA ATATCTTGCA 

101 GACAGCCAAC GGAAACAAGC ACTAAATTAA AGAGAGTGGC TGGTGGAAAT 

151 AGGTCAAAAC AGCCCAGTGA AGCATAGTCG TTATATGACA AAATGACCAG 

201 AAGATGTGTT TACATTACTG 



20 



50 



(2) INFORMATION FOR SEQ ID NO: 11: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 356 base pairs 
25 (B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: genomic DNA 

(vi) ORIGINAL SOURCE: 
30 (A) ORGANISM: Homo sapiens 

(viii) NAME: g5ml 9 . genomic 

(ix) FEATURE: 

35 (A) NAME /KEY : EXON: 65-284 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

1 ttttaaactt gtttctcttt cttttattaa atacaaccct gtattggcat 

40 51 tttccatttg atagAAAAAT AACCTGATAC AAGCAGAATT GTCAAGGAAT 

101 GTCATATGTC AAAAAATGAA GTGACAAATG AGAAAGGGCT GAGTCACTTA 

151 GAAAATATCT TGCAGACAGC CAACGGAAAC AAGCACTAAA TTAAAGAGAG 

201 TGGCTGGTGG AAATAGGTCA AAACAGCCCA GTGAAGCATA GTCGTTATAT 

251 GACAAAATGA CCAGAAGATG TGTTTACATT ACTGgtatgt aaactacata 

45 301 ttatcagagg cttgatattg gtgaccttca aattaagtac cttaattttg 

351 tcggat 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 114 base pairs 

(B) TYPE: nucleic acid 



55 (ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

60 (viii) NAME: exon.g6ml8 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: 

1 GTGGTCAAGA AAAGTTTCAT TGAGAAGATA CAATTTGGTC AGAGACTTGA 
65 51 TAAGGTGAAG CTCATCCTGT TGATATCACC CTGTGGAGAA ACAGCATTTC 
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101 CAGGAGGTAA AAATA 
(2) INFORMATION FOR SEQ ID NO: 13: 

5 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 286 base pairs 

(B) TYPE: nucleic acid 

10 (ii) MOLECULE TYPE: genomic DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

15 (viii) NAME: g6ml8 . genomic 

(ix) FEATURE: 

(A) NAME /KEY: EXON: 80-194 

20 <xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 

1 ctactqcaaa aagtaataaa ataaagaaga aaagggagat aagtagtgcc 

51 attagqtaat gactttcatt ttaaatgagG TGGTCAAGAA AAGTTTCATT 

101 GAGAAGATAC AATTTGGTCA GAGACTTGAT AAGGTGAAGC TCATCCTGTT 

25 151 GATATCACCC TGTGGAGAAA CAGCATTTCC AGGAGGTAAA AATAgtaagt 

201 acaagaggct cctagacgga caagcttggt attcttgaga gagcaaggag 

251 accagtgtgg ctgctatgta cagtggccaa ggggaa 

30 (2) INFORMATION FOR SEQ ID NO: 14: 



35 



40 



50 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 141 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: exon.g7ml0 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 



45 i ATGGAGTTGT CTCACTCTGG CACGATCTCA GCTCACTGCA ACCTCCGCCT 

101 CCTGGGTTCG AGAGATTCTC CTGCCTCACT CAGCCTCCCA G 



(2) INFORMATION FOR SEQ ID NO:15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 214 base pairs 

(B) TYPE: nucleic acid 

55 (ii) MOLECULE TYPE: genomic DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

60 (viii) NAME: g7ml0 . genomic 

(ix) FEATURE: 

(A) NAME/KEY :EXON: 33-181 * 

65 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
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1 atttgttttt cctataccca cttccaaagt agTATGTATT TTCTTTCGCC 

51 ACACTGTTTA CATTTATTTA TTTGTTTATT TATTTTTGAG ATGGAGTTGT 

101 CTCACTCTGG CACGATCTCA GCTCACTGCA ACCTCCGCCT CCTGGGTTCG 

151 AGAGATTCTC CTGCCTCACT CAGCCTCCCA Ggtagccggg attacaggca 

201 tgcaccacca tgcc 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 93 base pairs 

(B) TYPE: nucleic acid 



15 (ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

20 (viii) NAME: exon.g8mll 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: 

1 ACATAGCTGG AAGGCACCAT CCATGAACCA ACAAATAGGC TCTCACCAGA 
25 51 AATCAAATAT TCCTTGATCT TGGATTTTTC AGCCTCCAGA ACT 

(2) INFORMATION FOR SEQ ID NO: 17: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 163 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: genomic DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: g8mll . genomic 

(ix) FEATURE: 
(A) NAME /KEY : EXON : 39-131 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

1 aaqgqagctt atttgttcct tctatctatc atttatagAC ATAGCTGGAA 
51 GGCACCATCC ATGAACCAAC AAATAGGCTC TCACCAGAAA TCAAATATTC 
101 CTTGATCTTG GATTTTTCAG CCTCCAGAAC Tgtgagatat aaatttctgt 
151 tgtttacaag eta 

(2) INFORMATION FOR SEQ ID NO: 18: 



(i) SEQUENCE CHARACTERISTICS: 
55 (A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: cDNA 

60 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: exon.g9m21 
65 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
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1 TGATAGACTC ACAAGCCCTA TGTGGCATCA TGTAAGAATT ATCTTACTCT 
51 TGAACTGAG 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 183 base pairs 
10 (B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: genomic DNA 

(vi) ORIGINAL SOURCE: 
15 (A) ORGANISM: Homo sapiens 

(viii) NAME: g9m21 . genomic 

(ix) FEATURE: 

20 (A) NAME/KEY: EXON: 61-119 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

1 tatgcatgaa gaatcagtct ggataagagt gctcacttct gcctctactg 

25 51 tttctttcag TGATAGACTC ACAAGCCCTA TGTGGCATCATGTAAGAATT 

101 ATCTTACTCT TGAACTGAGg tatggtgaag gaaatgccacatcattttgt 

151 ttaatttacc aatatatctc caaataaact gcc 

30 (2) INFORMATION FOR SEQ ID NO: 20: 



35 



40 



55 



60 



65 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 103 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



(viii) NAME: exon.gl0m2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

45 1 GGTCAGATGA AAGTGAGATC CATACATCCT TCTTCAGCAA CTTGTGCCTC 

51 TGCTCTGCAC CTCCCGCAAT TAACTACTGA AAAAAGAACA CAGCTTCACA 
101 AAA 

50 (2) INFORMATION FOR SEQ ID NO: 21: ^ 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 215 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE; genomic DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: gl0m2 . genomic 

(ix) FEATURE: 

(A) NAME/KEY :EXON: 36-138 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

1 gatcactatt ttcccctcca ctttaccccc tgcagGGTCA GATGAAAGTG 

51 AGATCCATAC ATCCTTCTTC AGCAACTTGT GCCTCTGCTC TGCACCTCCC 

5 101 GCAATTAACT ACTGAAAAAA GAACACAGCT TCACAAAA gt gagttgaagt 

151 gcataccaca gaaccatcaa attccatatt tagaacaaga aaaattctat 

201 aagacactat ttctg 

10 (2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 70 base pairs 

(B) TYPE: nucleic acid 

15 

(ii) MOLECULE TYPE: CDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: exon.gllm3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

25 1 GAGATTGTAA AATCAGGAAG TATATCTAAG TCACCTCCAG TAGCCGTAAC 

51 TCTACCTTGT CCAGTAAAAG 



20 



30 



50 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 134 base pairs 

(B) TYPE: nucleic acid 



35 (ii) MOLECULE TYPE: genomic DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

40 (viii) NAME: gllm3 . genomic 

(ix) FEATURE: 

(A) NAME /KEY : EX ON : 32-101 

45 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 



1 tgtactatat gttttcatct tgtatttcta gGAGATTGTA AAATCAGGAA 
51 GTATATCTAA GTCACCTCCA GTAGCCGTAA CTCTACCTTG TCCAGTAAAA 
101 Ggtaagtatg tgaaatagta taatttagaa gtaa 

(2) INFORMATION FOR SEQ ID NO: 24: 



(i) SEQUENCE CHARACTERISTICS: 
55 (A) LENGTH: 241 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: cDNA 

60 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: exon.gl2ml4 

65 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
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1 GGACAGGACC TCTATATGTT GCTCAGGCTG GAGTGTGGCA GCTATTCACG 

51 GATGTGATTA TAGTACACTA CAGCCTCAAC TCCTGGGCTC AAGCAAAGCT 

101 TCCCAAATCG GTGGGACTAT AGGCACACGC CACTGTGCTG TTCAATAATA 

5 151 AGATTTCTGT CTAACACCAC TGCGCCTGTT TCCTTGATAA ATATTTATTA 

201 TCTGTGTTTA TTTATTTTAT CCTGGGAGAA CATACAAGAT T 



10 



40 



(2) INFORMATION FOR SEQ ID NO:25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 273 base pairs 

(B) TYPE: nucleic acid 



15 (ii) MOLECULE TYPE: genomic DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

20 (viii) NAME: gl2ml4 . genomic 

(ix) FEATURE: 

(A) NAME /KEY : EXON : 33-220 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

1 ttcttttctt ttcttttctt tttaattttt agGGACAGGA CCTCTATATG 

51 TTGCTCAGGC TGGAGTGTGG CAGCTATTCA CGGATGTGAT TATAGTACAC 

101 TACAGCCTCA ACTCCTGGGC TCAAGCAAAG CTTCCCAAAT CGGTGGGACT 

30 151 ATAGGCACAC GCCACTGTGC TGTTCAATAA TAAGATTTCT GTCTAACACC 

201 ACTGCGCCTG TTTCCTTGAT aaatatttat tatctgtgtt tatttatttt 

251 atcctgggag aacatacaag att 

35 (2) INFORMATION FOR SEQ ID NO: 26: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 32 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: cDNA 



(vi) ORIGINAL SOURCE: 
45 (A) ORGANISM: Homo sapiens 



50 



60 



(viii) NAME: exon.gl3m22 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 



1 GTTGTGAAAG AAAAATAAAT CTTGGGGCTC CAAAATCACT ACGCTAAAGG 

51 GAATAGTCAA GCTAGGAGCT GCTTACAGCA AACCTGCCTC CCATTCTATT 

101 CAAAGTCACC CCTCTGCTCA GAGATAAATG CATATCTGAT TGCCTCCTTT 

151 GGAGAGGCCA ATCAGAAACT CAAAAGAATG CAACTATTCA TCTCTTATCT 

55 201 ACCTATGACT TGGAAGCCCA CTCCCTGCTT CAAGTTGTCC CACCTTTGCT 

251 TCAAGTTGTC CAGCCTTTTC TGGACAGAAC CAGTGTTTAT CTTACATATA 

301 TTGACTGATG TCTCATGTCT CTCTAAAACA TATAAAACCA AGCTGTGCTC 

351 TTGAAGTGGC CTCATCATCT GGGGTGACAC CCGAGGTTCG TTGTCTCACG 

4 01 GCCATAAAGA TAAAGAACAT GGACACAAAA GG 



(2) INFORMATION FOR SEQ ID NO: 27: 



(i) SEQUENCE CHARACTERISTICS: 
65 (A) LENGTH: 702 bas pairs 
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(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: genomic DNA 

5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: gl3m22 . genomic 

10 (ix) FEATURE: 

(A) NAME /KEY: EXON: 139-170 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 

15 1 atctaaagac atcactgggg gccatttgtc gacatgggtg ggccatttgt 

51 tgaagaaaag ttaaggtaag tatggttaaa tatttcttct tgaacttgtt 

101 agattctcac agaaaatatt ctgttctctt gcctgcagGT TGTGAAAGAA 

151 AAATAAATCT TGGGGCTCCA AAATCACTAC GCTAAAGGGA ATAGTCAAGC . 

201 TAGGAGCTGC TTACAGCAAA CCTGCCTCCC ATTCTATTCA AAGTCACCCC 

20 251 TCTGCTCAGA GATAAATGCA TATCTGATTG CCTCCTTTGG AGAGGCCAAT 
301 ' CAGAAACTCA AAAGAATGCA ACTATTCATC TCTTATCTAC CTATGACTTG 

351 GAAGCCCACT CCCTGCTTCA AGTTGTCCCA CCTTTGCTTC AAGTTGTCCA 

4 01 GCCTTTTCTG GACAGAACCA GTGTTTATCT TACATATATT GACTGATGTC 

4 51 TCATGTCTCT CTAAAACATA TAAAACCAAG CTGTGCTCTT GAAGTGGCCT 

25 501 CATCATCTGG GGTGACACCC GAGGTTCGTT GTCTCACGGC CATAAAGATA - 

551 AAGAACATGG ACACAAAAGG gtgaggttta gagcagaaat ttaataggtg 

601 aaagaaagtg aacagctctc tgctacagag aggggtccca gaaaaatggg , 

651 ttgccgattc acagtttgga tacagaggct tttataagaa atcaatagtg 

701 gg 



30 



50 



(2) INFORMATION FOR SEQ ID NO: 28: 



(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 128 base pairs. 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: cDNA 

40 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: exon.gl4ml2 

45 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 



1 CTATGGTAAC AGTTCCCTTC ATTACTTACT CAAACTTCAG AGAGATAAAA 
51 GAGAAGGAGT CACAGCATCT TTGTGCAAAA TATGCCTCGT TTTCTGGGAA 
101 AAGGCTTGTT TCAGAAGAGA AGACAGTG 

(2) INFORMATION FOR SEQ ID NO: 29: 



(i) SEQUENCE CHARACTERISTICS: 
55 . (A) LENGTH: 204 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: genomic DNA 

60 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: gl4ml2 . genomic 

65 (ix) FEATURE: 
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W Wffi»4 PCFflWUBfiWWSOS 

(A) NAME/KEY : EXON : 37-164 g 7 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 

"IS! Sl fes? ssEsssspagggBsi ai sHBa-*— » 

glyKJol. »W<to^^ Wfi? c ^flM£r ordinary 

201. aatt 

]0 conditions of storage and use, these preparations contain a preservative to prevent the 
5 (2)grflW*(N>fcn^ N0:30: 
(i) SEQUENCE CHARACTERISTICS: 

15 Th^)ttoE*}ffiOfk: ^ii!§§SitRftk r %f the present invention are advantageously 

(B) TYPE: nucleic acid 

administered in the form of injectable compositions either as liquid solutions or 

(ii) MOLECULE TYPE: cDNA r n 

suspensions; solid forms suitable for solution in, or suspension in, liquid prior to 

20 (vi) ORIGINAL SOURCE: 

injection rflB^W^aSteff. 0 TflSftPfteparations also may be emulsified. A typical 
10 compositioWARft: siSrtP 1 ^^^ comprises a pharmaceutical ly acceptable earner. For 
25 (xiftstaB(^^ftdffipgH?S3^ffliy e^ftaliP flPn$ 25 mg, 50 mg or up to about 100 mg of 

pharmaceutically acceptable carriers include aqueous solutions, non-toxic excipients, 
( 2 ln<aBaftfpfflis?^re r s^SS9e?, I rjiIr?e : ^d the like. 

(i) SEQUENCE CHARACTERISTICS: 

^ Ex £ijP^fK^ are propylene glycol, polyethylene glycol, 

w ^ff%^SB^Vt^ $Mtt£<? s l^ such as ethyloleate. Aqueous carriers include 
[vYf^i^^ 1 ^^^ solutions, saline solutions, parenteral vehicles such as sodium 

4U chloride, tSAgmm^. "^venous vehicles include fluid and nutrient 
reple'rniners^se^fvVsm^ antimicrobial agents, anti-oxidants, chelating agents 

|() ancAfi£rt F ||j<^^ of the various components the 

P h WWtoffiS3{l P& ©^s^ aaroiriring to well known parameters. 

1 aacagpggaa ttacatgacc aactttcatt ttatcagGAT ^TTCTGGCA ' 

50 5 3A n dS^^^ on the 

101 TCCAAGATTG ATGATGGTGG CTCAGATCAG gttggtagca gtgcagatW 

intend ge§£tttj^o£ntk:£ggit dose" or "dosage" refers to physically discrete units 
suitable for use in a subject, each unit containing a predetermined-quantity of the 

55 (2J INFORMATION FOR SEQ ID NO: 32: 

25 therapeutic composition calculated to produce the desired responses, discussed above, in 

U) SEQUENCE CHARACTERISTICS : 

association (wjth,DNGaitoiili4^a!t«)^ tesiflfe appropriate route and treatment regimen. 
(B) TYPE: nucleic acid 
60 The quantity to be administered, both according to number of treatments and unit dose, 

(ii) MOLECULE TYPE: cDNA c 

depends on the protection desired: ' 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

65 
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(viii) NAME: exon.gl6m23 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

5 1 AACTCAGAAG TGATTCTCTA CTGAAATGAT ACAGTGTGCC CCAAAGACAT 

51 TGCATTAGAA CAATGGTAAT GGCAGGACTC TATGCCCATG ATTTCAGGAT 
101 ACTTCTTCTA GGAATTAACT ATAT 

10 .(2) INFORMATION FOR SEQ ID NO:33: 



15 



20 



25 



35 



55 



(i) SEQUENCE CHARACTERISTICS: - 

(A) LENGTH: 324 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: genomic DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: gl6m23 . genomic 

(ix) FEATURE: 

(A) NAME /KEY: EXON:8 9-212 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 



1 atctatgtaa acacagaaaa aatataagaa ctaaccccag gcaatgactt 

51 qqtactgttt aaagaaagca ttcttcctta ttttctagAA CTCAGAAGTG 

30 101 ATTCTCTACT GAAATGATAC AGTGTGCCCC AAAGACATTG CATTAGAACA 

151 ATGGTAATGG CAGGACTCTA TGCCCATGAT TTCAGGATAC TTCTTCTAGG 

201 AATTAACTAT ATgtaagtgt cttttttatt gaaaatattg gactagctac 

251 atcgagatgc ctttctgggt ttttttgcca ttagccaatt atgttagttt 

301 tatgcttttg ctttatttca taag . 



(2) INFORMATION FOR SEQ ID NO: 34: 



(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH : 59 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: cDNA 

45 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: exon.gl7m24 

50 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 



1. ATTCTAGGGT ATCGATTCAT TTATAGTGGG GCAATCTTGC TGGAGATTAT 
51 AGAGAAAAG 

(2) INFORMATION FOR SEQ ID NO: 35: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 208 base pairs 
60 (B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: genomic DNA 

(vi) ORIGINAL SOURCE: 
65 (A) ORGANISM: Homo sapiens 
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(viii) NAME: "g 17m2 4 . genomic 

tr ^P¥&ft*u?Sl a * n d * sease states. One mechanism for delivery is via viral infection 
5 ^ i_(AJ NAME/KEY: EXON : 52-110 . . . f . . , . . 

where the expression construct is encapsidated in an infectious viral particle. 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:35: 

! ^S^a^eitHlcatM^sdSat aaaatgtatc tttttttctc tctgaaaata 
10 51 gATTCTAGGG TATCGATTCA TTTATAGTGG GGCAATCTTG CTGGAGATTA 
101 TAGAGAAAAG gtcagttttc tgccaaatta ctttcattat gttgtccaag 
151 aag^teaeralt mn&'gfot aBteth(t>d$ afcfct ghe a afetftfe* q# tfe^§ssJ§Ba constrU cts into 
201 tatatctg 

5 cultured mammalian cells also are contemplated by the present invention. These 

(2) iactate^ta^^ (Graham and Van Der Eb, 1973; Chen and 

Okl)>aiMQU©835 RJftftAgfSft^spg?^, DEAE-dextran (Gopal, 1985), electroporation 
(A) LENGTH: 63 base pairs 
20 (Tur-Kastpa #Yfi£; J986;eP©t»pi^f aL, 1984), direct microinjection (Harland and 

Wtii)trW0&pa98S),TIB5A-W88fed liposomes (Nicolau and Sene, 1982; Fraley et aL, 

10 (vijig^cg^liisgMKa&iine-DNA complexes, cell sonication (Fechheimer et al t 1987), 
25 (A) ORGANISM: Homo sapiens 

gene bombardment using high velocity microprojectiles (Yang et aL, 1990), and 
(viii) NAME: exon.gl8m25 * . 

receptor-mediated transfection (Wu and Wu,. 1987; Wu and Wu, 1988). Some of 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 

30 i lhe SS^ S b cS C c e ^ 

51 CAAAGCATCA AAG 

In one embodiment of the invention, the expression construct may simply 
35| 5 ( 2 ) c6rir»Md»rmliifl vHcioV? transfer of the construct may be performed by any 

of the n^^5|^| G ^%§3^§e^§^^ 5hysically or chemically permeabilize the cell 
4Q membrarle^ ^IxaiWpT^^blfisI^ et al. (1984) successfully injected polyomavirus 

tfNA> iiWfoM o?^0?f>?eBj&aPe N s A into liver and spleen of adult and newborn mice 
(vi defHbWr^h^^^gjral rtp^ajj^ acute infection. Benvenisty and Neshif (1986) 

al ^vi ^ £ ^^OSW^^^ ifeftfe ^l r §^VicJim?P e " toneal in j ect * on °f CaP0 4 precipitated plasmids 
re^J^s of the transfected genes. It is envisioned that DNA encoding an 

antisense'ftfc^^fconsinjct ^so may 4 be transferred in a similar ma'nner in vivo. 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 



l tgat^c%^ g^E^«^€0^«««fc^8ft4)eeBdteli^a«d krAfcttbeaortltthe nucleic acid 
51 ttgaqggacc ctcctgccca gacaaatgat ttqttcatta cagAAACTTC k 
55 loi^nc^^B^sc^^^AeaoeT hcqH2»ti&3a&d awexpcs^dCRiditAAefiBsites. In certain 
151 TCAAAGgtca gttttctgta aaattacttt tgttacattg tccctaaaaa 
25 2 o.i em Q8&8a^ may be stably integrated into the 

genome of the cell. This integration may be in the cognate location and orientation 
60 (2) INFORMATION FOR SEQ ID NO: 38: 

via homologous recombination (gene replacement) or it may be integrated in a 

(i) SEQUENCE CHARACTERISTICS: 

random, (K^-spw^H::lcH9ittoffte^?iKiaagmentation). In yet further embodiments, the 
(B) TYPE: nucleic acid 

65 
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(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: exon,gl9m5a 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

1 TGCTGCTTGT TGCTGACGTT TCTGCCCAAA CATCATGAAA TGTGGAGAAG 
51 AAAGAACACT CAGATGTGCC CTGGAGACTT TCCTTTTATT TTTAAACAG 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: . 

(A) LENGTH: 122 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: exon.gl9m5b 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: 

1 TGCTGCTTGT TGCTGACGTT TCTGCCCAAA CATCATGAAA TGTGGAGAAG 
51 AAAGAACACT CAGATGTGCC CTGGAGACTT TCCTTTTATT TTTAAACAGG 
101 TACATCCTGA AATTATGTTA GA 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 194 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: genomic DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: gl9m5 . genomic 

(ix) FEATURE: 

(A) NAME /KEY : ALTERNATE EXON (a): 39-137 
(A) NAME /KEY : ALTERNATE EXON (b) : 39-160 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: 

1 ttacataatc tcactttttc ttattcgttt ccttttagTG CTGCTTGTTG 

51 CTGACGTTTC TGCCCAAACA TCATGAAATG TGGAGAAGAA AGAACACTCA 

101 GATGTGCCCT GGAGACTTTC CTTTTATTTT TAAACAGGTA CATCCTGAAA 

151 TTATGTTAGA gtaagtttca acatattcaa aatcatgtca taaa 



(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH:. 191 base pairs 

(B) TYPE: nucleic acid 



(ii) MOLECULE TYPE: cDNA, 



2 Po 
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(vl to85^ne^; |^^§jlRVt«iniM*api<*sa/., 1983). .When a recombinant plasmid 
5 co^Sy^A^f^rfopft** with the retroviral LTR and packaging sequences is 
(xi^ nt §?^^tf 1 ^ l E^^^» w $W CHJcHOn4phosphate precipitation for example), the 
packajing T5W e|^atfQ^ to be 

in 5i RATACATCTA GATTCACATG CTCAAAATCT AAAAGATTCA T^GGATAaA 

151 TTTAAATTAG TCTCTAGTCT TCAGCCCTCA CCAGTGCACA A 

and Kubenstem, 1988; Temin, 1986; Mann et al., 1983). The media containing the 

15 (2) r ^fff?^5k««rTOkuseoiathaiOc4Hected, optionally concentrated, and used for gene 

traq^fer. S E^»«OTalHS^or£^6Taalfeble of infecting a broad variety of cell types. 

(A) LENGTH: 262 base pairs " ' v 

However,(Hitegi!fl8iB)n and)s*alrieas)i^-ession require the division of host cells (Paskind 

et #?i)19#&)ECULE TYPE: genomic DNA 

(vi) ORIGINAL SOURCE: 

A (jMv<0RSj5pft§ft5h d§51|ngaPftf "Slow specific targeting of retrovirus vectors 

wa;^±ft<WntlJAt<feve^$fetf b^ga^ff the chemical modification of a retrovirus by the 

chemical addition of lactose residues to the viral envelope. This modification could 

(ix) FEATURE: 

30 permit the(^d?W?ri^»6¥8Phe^Ifo^les via sialoglycoprotein receptors. 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:42: 
15 i A tS/takWPc^tl!^ been 

35 »HB«S SSSISnHri, and 

4Q comfjohentf bf^ffl'str^ptavidin (Roux et al, 1989). Using antibodies against major 

c,ass n antigens, the infection of a variety of 
20 humat^ ce^^jbe^^QS? E $}ir<fc$%<flntigens with an ecotropic virus in vitro was 

(ii)-MOLECULE TYPE: cDNA 

I here are certain limitations to the use of retrovirus vectors. For example, 

50 (vlj ORIGINAL SOURCE: , . 

retrovirus Y£$ l <tf&<&WfW: iitfsgcatsaimenandom sites in the cell genome. This can 
lea^g^eg^ejal gvwageposis through the interruption of host genes or through the 
2& (^eij^g^r^ff^j^ with the function of flanking 

genes Wwpmrtmbc mnG&VBa^^ 

51. CGAAGGAAAG CTTCCAATTA TGGGGAACAA GTCCTCtGaA GTfeGCtKSA^ 

vectqrgxis ^(potential A^^fiflltf^cffGwild-type replication-competent virus in the 
packaging cells. This may result from recombination events in which the intact 

(2) INFORMATION FOR SEQ ID NO: 44: 



(i) SEQUENCE CHARACTERISTICS: 
65 (A) LENGTH: 190 base pairs 
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(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: genomic DNA 

5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: g21m7 . genomic 

10 (ix) FEATURE: 

(A) NAME / KEY : EXON : 32-153 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

15 1 tggtttgttt ctgtttttgt tattgtttca gTTTTTTTTT CCATTGGGTT 

51 TGACCAACTC TATATTCGAC TTGAACAAAT CCGAAGGAAA GCTTCCAATT 

101 ATGGGGAACA AGTCCTCTGA AGTGGCTAAA TTCCCACACA CACAAAAGAA 

151 AAGgtagggt ggtggggggg agaaaaacag ccagcaaaag 



20 



50 



55 



60 



(2) INFORMATION FOR SEQ ID NO:45: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 211 base pairs 
25 (B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 
30 (A) ORGANISM: Homo sapiens 

(viii) NAME: exon . g22ml5 

(xi) SEQUENCE. DESCRIPTION: SEQ ID NO:45: 

35 

1 ATGAGTCTAT GCCCAGGACC ACCAGATAAT TGAGTCCTGT ACAAAAGCTT 
51 CTGACTAAAC AATGTGCTCT GGCTCAGGAC TATACAGAGA AAAGACACAG 
101 TTTTTAAATT GATCGTTCAA AAGGAAACAT ATTTATGATA TTTGCTCCAT 
151 GATATGTATC TCTCATCTGT TAGCTCAGGC AGAATTAAAA TGCTAGACAA 
40 201 TAAAAAAAAA A 

(2) INFORMATION FOR SEQ ID NO: 4 6: 

45 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 375 base pairs 

(B) TYPE: nucleic acid 



(ii) MOLECULE TYPE: genomic DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: g22ml5 . genomic 

(ix) FEATURE: 

(A) NAME/KEY: EXON: 52-254 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 



1 aatatgtaca tatcttagag aatcatttat tgtaattgtt ttcttttcca 

51 gATGAGTCTA TGCCCAGGAC CACCAGATAA TTGAGTCCTG TACAAAAGCT 

101 TCTGACTAAA CAATGTGCTC TGGCTCAGGA CTATACAGAG AAAAGACACA 

151 GTTTTTAAAT TGATCGTTCA AAAGGAAACA TATTTATGAT ATTTGCTCCA 

65 201 TGATATGTAT CTCTCATCTG TTAGCTCAGG CAGAATTAAA ATGCTAGACA 



BNSOOCID: <WO 0012694A1_I_> 
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(2) 
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251 ATAAtgttcc atgccactat gcttactcgg gtctctcaca ttggtgtact 

301 tctggcacaa agtcacatga catttqagta atagctgtct cccatgtgga 
351 ctttacacca catgatgtca ggggcol 

^NFO^TiON n FO^s C E^ tiPkSPM of fresh medium and shakin I " initialed. For 
Vi U? SEQufe "har^McS f ow 10 about 8 0% confluence, after which time . 
the mediAM i*KgS&d j^2^§{jjfle^nal volume) and adenovirus added at an MOl 

°Ui)° MOLECULE re TYP r £: , c&NA tationar y ov emight, following which the volume is 

(vi) n 0RlS«JS. 1 s8S£fi: snakin 8 is commenced for another 72 h. 
15 * (A) ORGANISM: Homo sapiens 

(viii? t l5A^E h : an ete^» nent that «he adenovirus vector be replication defective, 

(xi? r llte^^eRitoftefec^ |hr R©twre : of the adenovirus vector is not believed 

20 to be £ ruc^^j^ajsgSilca^ be of M 

<• . i5J /i i A&&CTATACT CTTAATCTGT ACCGGGAAGT TCTATACTGT TGCTTTAATC 

of l^ 42 ^™^^?^^ <Btsuhgrcai|a A«F.T<S^«ViWgg^fW subgroup 

in 15 lu CAACACTCAT AATCCCTGAA TGGATAGCAC CCAATAAAAG AGAGHACATC 

25 c fti he 3««^i¥M^<»f^ 

« £51. TTGACCTGAA TATCTTATCT TTCCGTGTTA AATGCTCTTC TCTCTTGACC 

def lbr e Tera , ftffl& Ye^T^TUsrcdoTdKSC^r^MTBW^iol^AT^jT^ because 
a a2S1 .cgctttaaat tgtccacatc 

AdenoviiflSType 5 is a human adenovirus about which a great deal of biochemical and 
( 2) ge R?F\SM , rRW ti ^ s ^RPWD, 80d48has historically been used for most constructions 



em^lpjyigi^g^ii^s^aTERtErJTICS ; 

(A) LENGTH: 4 97 base pairs 

(B) TYPE: nucleic acid 



(B) . TYPE : nucleic acia 
A typical vector applicable to practicing the present invention is replication 
f ii) MOLECULE TYPE: genomic DMA 
defective and will not have an adenovirus El region. Thus, it will be most convenient 

(vi) .ORIGINAL SOURCE: 

40 to introdu<fft)thsipal}iBwle<H«itoefieBa^the UC41 gene at the position from which 
the ( ^I J: s9dHJ&iBeq9ediBei.b8BS>i!»en removed. However, the position of insertion of 
the ( cpps^H^URilhin the adenovirus ^sequences is not critical. The polynucleotide 

IS • ""S^s^^^feSr b slQ nS i1? , flD: i ^! eu of the de *" d E3 region in E3 
replacement vectors^ de^cribed^v K^I^/^i^cOr^^iRSapn where 

50 a ^^feS#ii«S GCTGTTTATT 
151 TCAACCAAGA TACACCTTGA AGACAATCTG CCTGCAACAC TCATAATCCC 

25 plaq^orM^^c^ ^igf^SS.S&cycle. of 
6o adenovirus does not require integration into the host cell genome. The foreign genes 
IT^mi^mi^^S^i^somBi and , therefore/have low genotoxicky to 

host ^ MsS ^^ff,e|(p|^P|i|ly|£>p|ted in studies of vaccination with wild-type 
65 (B) TYPE: nucleic acid 



BNSOOCID; <WO 0012694A1J_> 

BNSOOCID; <WO_0M7773A1_L> 



WO 00/12694 



PCT/US99/19508 



19 



(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: exon.g24ml6 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

1 AATTAAAACG ATGTATTAAG CTGGCCTTTT TTCAGACATA CTTGCAACCA 

51 GAGTCATCAA TTTTGAAGAC AGCAACCAAG CCAAGCAATG TGAGGTTACA 

101 GCTATGAAAT AGAACAGAGA TGTCTAGACT ATAGACACAG CCTGCCGTTT 

151 TGTGCTGATT GGTAAAGTGT TCCAGCCAAC TGGAAGCAAA TATTTCTCAG 

201 AAGCAGTTTC CTGCTCTCAT CCTCTCCTCG CCATGCCCAC TGTGCCCAAC 

251 ATGGCTCCAG CTGGGTCACA GAAGACTTTG TCCTGGAATA CAGCATTTCC 

301 CTATTTAAAT CTCTAACTTT GTATGTACTC TTTTCAATAA AAGCATATTT 

351 TTCATTACCA AAAAAAAAAA 

(2) INFORMATION FOR SEQ ID NO:50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 50 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: genomic DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: g24ml 6. genomic 

(ix) FEATURE: 

(A) NAME/KEY: EXON:71-440 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 

1 gtatttatag tctgtccttc ttatgtagag aacacgaatg tctaattatt 

Si tacatttatt tttatgatag AATTAAAACG ATGTATTAAG CTGGCCTTTT 

101 TTCAGACATA CTTGCAACCA GAGTCATCAA TTTTGAAGAC AGCAACCAAG 

151 CCAAGCAATG TGAGGTTACA GCTATGAAAT AGAACAGAGA TGTCTAGACT 

201 ATAGACACAG CCTGCCGTTT TGTGCTGATT GGTAAAGTGT TCCAGCCAAC 

111 ?GG^SSa TATTTCTCAG AAGCAGTTTC CTGCTCTCAT CCTCTCCTCG 

301 CCATGCCCAC TGTGCCCAAC ATGGCTCCAG CTGGGTCACA GAAGACTTTG 

351 TCCTGGAATA CAGCATTTCC CTATTTAAAT CTCTAACTTT GTATGTACTC 

401 TTTTCAATAA AAGCATATTT TTCATTACCA ttctgaccat actcccttct 

(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 99 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: exon.g25m8 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 

1 TCTTCAAGAT AGAAAATAAG CTGTCTCTGA AGAGGTGCAA GTGGGATGCT 
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51 CCCAGGCATT TCTTGTAAGT TGTAATACCA CATTGCCTGA AATGATAGT 

79 

(2) INFORMATION FOR SEQ ID NO: 52: 

5 infection, the adenoviral infection of host cells does not result in chromosomal 

.(i) SEQUENCE CHARACTERISTICS: 

integral^ te®WP5fi:ado«OYtf%lBl$KArs:an replicate in an episomal manner without 

(B) TYPE:, nucl ic acid 
potential genotoxicity. Also, adenoviruses are structurally stable, and no genome 
10 (ii) MOLECULE TYPE: Genomic DNA 

rearrangement has been detected after extensive amplification. Adenovirus can infect 

(vi) ORIGINAL SOURCE: 

5 virtuallyA^lo«fa^^:ceitocregafdle£»of their cell cycle stage. So far, adenoviral 
1 5 ^o£ociionQappear§26ite. yokedianly to mild disease such as acute respiratory disease in 

hUHiaOXATURE: 

(A) NAME /KEY : EXON: 36-174 

20 ( xi ) s?G!g^nS£^^ «c as a gene transfer vector because of 

^ % AAG ^TG T and k'^h 

(ITRs), which are cis elements necessary for viral DNA replication and packaging. 

(2 ) iPHSPeWy^ a^lSSqL^e^ortfbf the genome contain different transcription units 
30 ACCG^aftSffl^^^ and E1B) 

(2) iNFStofSlf^MW^'^i^ 6 re g ulation of transcription of the viral genome 

35 15 ACCGG a ?fe&c^ 

™™*6&$mS^^ proteins m involved in 

<2) P N ¥6V^?W^ la i£Qg^ e ro§? sion and host cell shut-off (Renan, 1990). The 

40 A7GAEA%Mi^&§^dSh iac]udin ^ the ma j° rit y of the viraI ca P sid P rotein s. are 
expressed only after significant processing of a single primary transcript issued by the 

45 20< 2 > WijSr^^MSfei^L^.^hl^lLP, (located at 16.8 m.u.) is particularly efficient 
at ^Q9M^Tff^^Sf?P^}iS?^ ^S^^fc(RuSn^ln§^ 1 the mRNAs issued from this promoter 
(2 ) fft£»fr? otf W^l^ff G6?^ sequence which makes them preferred mRNAs for 

CGGAJ^^^tfHfcATCTA'B'C'NNNNNN v 

(2) iNFORMfeiSHniee^ysE^Qrt) Sp«e8^s, recombinant adenovirus is generated from 
25 cGGAfl9?CT©g^Ai^5IP^OTtiTOTllfetween shuttle vector and provirus vector! Due to the 
possible recombination between two proviral vectors, wild-type adenovirus may be 

(2) INFORMATION FOR SEQ ID NO: 59: * 

60 generated from this process. Therefore, it is critical to isolate a single clone of virus 

CGGAATTCTGCAGATCT 

from an individual plaque and examine its genomic structure. 
(2) INFORMATION FOR SEQ ID NO: 60: 

65 
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(NH2 ) -GraGTGCAAGGCTO 

(2) INFORMATION FOR SEQ ID NO: 61: 
(NH2) -TGAGTAGAATTCTAACGGCCGTCATTGTTC 

(2) INFORMATION FOR SEQ ID NO: 62: 
GAACAATGACGGCCGTTAGAATTCTACTCA- (NH2) 

(2) INFORMATION FOR SEQ ID NO: 63: 
TGAGTAGAATTCTAACGGCCGTCAT 

(2) INFORMATION FOR SEQ ID NO: 64: 
(P04 ) - GTAGTGCAAGGCTCGAGAAC 

(2) INFORMATION FOR SEQ ID NO: 65: 
(P04 ) - TGAGTAGAATTCTAACGGCCGTCATTG 

(2) INFORMATION FOR SEQ ID NO: 66: 

(NH2 ) -GTAGTGCAAGGCTCGAGAACTTTTTTTTTTTTTTT 

(2) INFORMATION FOR SEQ ID NO: 67: 
(P04 ) - GT AGTGCAAGGCTCGAGAACTTTT 

(2) INFORMATION FOR SEQ ID NO: 68: 

(NH2 ) - GTAGTGCAAGGCTCGAGAACNNNNNNNNNNNN 

(2) INFORMATION FOR SEQ ID NO: 69 : 
(NH2 ) -TGAGTAGAATTCTAACGGCCGTCATTGTTC 

(2) INFORMATION FOR SEQ ID NO:70: 
GAACAATGACGGCCGTTAGAATTCTACTCA- (NH2 ) 

(2) INFORMATION FOR SEQ ID NO: 71: 
TGAGTAGAATTCTAACGGCCGTCAT 

(2) INFORMATION FOR SEQ ID NO: 72: 
( P04 ) - GTAGTGCAAGGCTCGAGAAC 

(2) INFORMATION FOR SEQ ID NO: 73: 
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( P04 ) -TGAGTAGAATTCTAACGGCCGTCATTG 77 

methods differ in their respective abilities to entrap aqueous material and their 

5 (2) INFORMATION FOR SEQ ltfNO:74: ■ v ^ 

respective aqueous space-to-lipid ratios. 

AGG AAG TAT ATC TA/f GTC ACC TCC A 

10 (2) iNF0RJ^lMW^:8 r & 0I $:W K P™»«* prepared as described above may be 
(BiSSSf^JBrtA tfW ^feSW diluted t0 m Wopriate concentration 
5 with an suitable solvent, e.g., DPBS. The mixture is then vigorously shaken in a vortex 
15 ( 2 ) nJBffi'HS!^ removed by centrifugation at 29,000 x g and the 

(P ^j!>o&^^^efl£Fs $l£hl£ C ffie washeJ^iposomes are resuspended at an appropriate 
20 ( 2 ) r 9f ^Btel^fP^c^^e^B^ t about 5 °- 200 mM- The amount of nucleic acid 
(Bi^W^^cgaRc^^ 1 ^^ ift^opfgdance with standard methods. After 
10 determination of the amount of nucleic acid encapsulated in the liposome preparation, 

25 (2) INFORMATION FOR SEQ ID N0:78: , 

the liposomes may be diluted to appropriate concentration and stored at 4°Cuntil use. 

(P04)-TT ACG GCT ACT GGA GGT GAC TTA 

In a preferred embodiment, the lipid dioleoylphosphatidylcholine is employed. 

30 (2) INFORMATION FOR SEQ ID NO: 79: 

Nuclease-resistant oligonucleotides were mixed with lipids in the presence of excess 

(P04J-AA GTC TCC AGG GCA CAT CTG A 

t-butanol. The mixture was. vortexed before being frozen in an acetone/dry ice bath. 

355 ( 2 )TtaGrf<D©aOTi^ and hydrated with Hepes-buffered saline (1 mM 

(BiHejbcs^fc&MM^alSIy^ and then the liposomes were sonicated in a 

bath type sonicator for 10 to 15 min. The size of the liposomal-oligonucleotides 
40 (2) INFORMATION FOR SEQ ID NO: 81: 

typically ranged between 200-300 nm in diameter as determined by the submicron 

. { PO4 ) -CGAAGGAAAGCTTCCAATTATG 

particle sizer autodilute model 370 (Nicomp, Santa Barbara, CA). 
45 (2) INFORMATION FOR SEQ ID NO: 82: 

20 4.8.4 Viral Delivery Systems 

GTA ATG AAA TCT GAG AAG CTG AA 
50 (2) INFORMATION FOR SEQ ID N0:U3: r J 

cac°aca ^G^^^fS^ tW 16 invention > the expression construct comprises a 
virus or engineered construct derived from a viral genome. The ability of certain 
55 UWmemWiBt l^Wx&t^x^^xtA endocytosis, to integrate into a host cell 
25 CAC g^n^ T aMSs^r&sJ^rS T ^efi?s stably and efficiently have made them attractive 
60 (2) C ^tei%N l ¥o{f a S^ r i^ &S§? * cncs int0 mammalian cells (Ridgeway, 1988; 

gene therapy vectors are generally viral vectors. 
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(2) INFORMATION FOR SEQ ID NO: 86: 
GTT TTC CCA GTC ACG ACG TTA TCT GTT CAC TTC ACC TTT G 

(2) INFORMATION FOR SEQ ID NO: 87: 
AGG AAA CAG CTA^ TGA CCA TCC TGA GCT TTC AAA AAA GTA TTC 

(2) INFORMATION FOR SEQ ID NO: 88: 
AGG AAA CAG CTA TGA CCA TGG TCT TGA CTT TTC ATT TAC TTC 

(2) INFORMATION FOR SEQ ID NO: 89: 
TAG CAT TGT TTG AAG CCA CAG 

(2) INFORMATION FOR SEQ ID NO: 90: 
CTG GAA GAA ACC TGT AAC TTG 

(2) INFORMATION FOR SEQ ID NO: 91: 
GTT TTC CCA GTC ACG ACG TGA AGC CAC AGA GTT TTA GAG 

(2) INFORMATION FOR SEQ ID NO: 92: 
AGG AAA CAG CTA TGA CCA TTG TTC TCA AAT AAT GTC CCA AA 

(2) INFORMATION FOR SEQ ID NO:93: 

GTA ATG CTA TAA TGT TTG AAA GG . 

(2) INFORMATION FOR SEQ ID NO: 94: 
TTC AGG CTA ACT TCC ATC TTC 

(2) INFORMATION FOR SEQ ID NO: 95: 
GTT TTC CCA GTC ACG ACG GGT TAC CCC AAC ATA CCT ATG 

(2) INFORMATION FOR SEQ ID NO: 96: 
AGG AAA CAG CTA TGA CCA TAA ATA GCA TAC ATA ATG TTT ATT C 

(2) INFORMATION FOR SEQ ID NO: 97: 
CAA AGA GTA TGG GAG GCT GA 

(2) INFORMATION FOR SEQ ID NO: 98: 
ACT TCA GAG AAC AAC TTC GTC C 
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(2) INFORMATION FOR SEQ ID NO: 99: 75 

GTT Tjfa»«ri!fiWeM^ 1991). inyetftirther 

5 ( 2 ) infM^OT ' M V W°BP complexed or employed in conjunction with both 

agg AAA V C^? d CT , A v1 9GA- dftv^ 1 ffe5 u Sb(fW6 s 9iWJ YBSJfiooiiiaifc been successfully employed in 
j Q transfer and expression of a polynucleotide in vitro and in vivo, then they are applicable 

5(2) ^Offi^^MenPinverfton 0 N $j|°l. ^ baaerial promoter is employed in the DNA 

GTG ^on&fffctTff Idso wifffe Sesirable to include within the liposome an appropriate bacterial 

15 



20 



(2) iPoSta^TON FOR SEQ ID NO: 102: 
£A GTG , AGG CAG 

jposome ' is a generic term encompassing a variety of single and multilamellar 



AAT GAA CCT M ACA GTG M AGG CAG 

Lie 1 ■ - 



(2) iWP^FEi^^teft fiMOTedEl>yitihasigertrf>aktion of enclosed lipid bilayers. Phospholipids are 

10 GTT j^eefcfer m$*m& ttediproi^a&8&dift§ ffiQfte^sfflfiifvemion and can carry a net 

25 positive charge, a net negative charge or are neutral. Dicetyl phosphate can be employed 

(2) INFORMATION FOR SEQ ID NO: 104 : 

to confer a negative charge on the liposomes, and stearyjamine can be used to confer a 

AAA CAG CTA TGA CCA TGT TCT TTT ACA TCT TAA CTCC RG 



30 



AGG AAA. CAG 

positive charge on the liposomes. 



(2) INFORMATION FOR SEQ ID NO: 105: 

Lipids suitable for use according to the present invention can be obtained from 

TCT AGT CAG CCT TCT TGA AC 

15 commercial sources. For example, dimyristyl phosphatidylcholine ("DMPC") can be 

35 

(2) ahiBd^frz^Spia^eMc^fe?,^icetyl phosphate ("DCP") is obtained from K & K 
gac tatamriKqi^^.JRY); cholesterol ("Choi") is obtained from Calbiochem (La 
40 (2) J i°nto^ ("DMPG") and other lipids may be obtained 

gtt tt™cca*gtc a^^ccJt^^t ^§£H^kB? TG^tock solutions of lipids in 

20 , chloroform, chloroform/methanol or f-butanol can be stored at about -20°C. Preferably, v. 
(2) c^ftSWffiWsiQ^s^? (My18$e 0 n 8 t : since it is more readily evaporated than methanol. 

AGG AAA CAG CTA TGA CCA TCA GGG TTT ATC CTT ATG AA 

50 Phospholipids from natural sources, such as egg or soybean phosphatidylcholine, 

(2) bHi^j^^ phosphatidylinositol, heart cardiolipin and plant 

GTT MCuQGA -5TC ACG ACG , TCA CAT GCT CAA AAT CTA AA 

or bactgriar phosphatidylethanolamine are preferably not used as the primary 
5 ^ 5 (2) p toMfeft e PoR°9^M!« , *c5Q%0Pr more of the total phosphatide Composition, 
agg teH5&? f <#& ij &A b &» ^l^toefisy)fate i*S0tiBgi]i£osomes. 



60 
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(2) iNFORirtlte^raw^ present invention can be made by different 

ctg 'Bfc^o&G 5&P afahertiposomes varies depending on the method of synthesis. A 
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(2) INFORMATION FOR SEQ ID NO: 112: 
AAA GAA AGC AGA ACC TTA GC 

(2) INFORMATION FOR SEQ ID NO: 113: 
GTT TTC CCA GTC ACG ACG TTC TCC TTA CCA TTA GAG CA' 

(2) INFORMATION FOR SEQ ID NO: 114: 
AGG AAA CAG CTA TGA CCA TAT AGG TGG CCT TGT TAT GTA 

(2) INFORMATION FOR SEQ ID NO:115: 
TTC TCC TTA CCA TTA GAG CAC 

(2) INFORMATION FOR SEQ ID NO: 116: 
[FAM]-CC TTC GGA TTT GTT CAA GTC 

(2) INFORMATION FOR SEQ ID NO: 117: 
CCA TTT GCC TAA TGA ATG AA 

(2) INFORMATION FOR SEQ ID NO: 118: 
GTC AGA AAA TCT TGG GTG TA 

(2) INFORMATION FOR SEQ ID- NO: 119: 

GTT TTC CCA GTC ACG ACG CTT AAG AAA GAG ATT GCC A 
(2) INFORMATION FOR SEQ ID NO: 120: 

AGG AAA CAG CTA TGA CCA TGC AAT GTG GTA TTA CAA CTT A 
(2) INFORMATION FOR SEQ ID NO: 121 : 

GTT TTC CCA GTC ACG ACG AAA ATA AGC TGT CTC TGA AG 
(2) INFORMATION FOR SEQ ID NO: 122: 

AGG AAA CAG CTA TGA CCA TGG GTG TAA AAT AAT TTC TGG 

(2) INFORMATION FOR SEQ ID NO: 123: 
CGT CTT ACT CAG TTT TGT ATT CT 

(2) INFORMATION FOR SEQ ID NO: 124: 
CAT CTA GAA GTA TGC ATT TGG TA 
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(2) INFORMATION FOR SEQ ID NO:125: 73 

gtt ttc ctac6H6in^b83fme^ dfShfWvfflabM^fti^ fir a nucleic acid in a cell 

C2)^wm«^1or , Sq ?D*WSf2fcf indudin 8 3 marker in the expression construct. 
agg^c^ctT^a^ 1 ^AA^^S/te^S te r S8^d cell permitting 
identification of expression. Enzymes such as herpes simplex virus thymidine kinase 
(2)(/^FJ!g69«^«:)F@p ^9©r4fipfi&iR^)7 : acetyltransferase (CAT) (prokaryotic) may be 
GTTemptoy^ gtc acg acg aat ccc tga' atg gat agc acc c 



( 2 ) i NFORQ^igNo EOfy f^gg j^i^acienylation signal to effect proper poiyadenylation 
™<k>fte tflfafflfft. m S^^mo^^^^&j^ to be crucial to 
20 ( 2 ftHlif W^eS^'TO^V and such sequence may be employed. For 
10 CTG e ^ p ^ h 6c§: V fSTpf«G ol McP r T a denovirus poiyadenylation signal may be employed.. 
25 Also contemplated as an element of the expression cassette is a terminator. These 

(2LiI^QRMATION FOR SEQ ID NO:130: 

elements can serve to erihance message levels and to minimize read through from the 

30 

(2). INFORMATION FOR SEQ ID NO:131: 

4.8. JSingle-chain Antibodies 

GTT TTC CCA GTC ACG ACG GAG TTA CAT TCA TTT TTC GAG TC 

3^ In yet another embodiment, one gene may comprise a single-chain antibody. 

(2) INFORMATION FOR SEQ ID NO: 132: ' fe J 

Methods for the production of single-chain antibodies are well known to those of skill in 

AGG AAA CAG CTA TGA CCA TTT CAA GAC CAG CCT GAC CAR V " usc ul SRJM 

the art. The skilled artisan is referred to U.S. Patent 5,359.046, (incorporated herein by 
<2refenraire*Afcrojfi^^ antibody is created by fusing together the 

TC<srariBbleritemaaftsT(3ttl5»ftd5Ciy and light chains using a short peptide linker, thereby 
4$) reconstituting an antigen binding site on a single molecule. 

(2) INFORMATION FOR SEQ IDK>:134: 

cat aga ^gl|qghfflMft(S%r?bble fragments (scFvs) in which the C-terminus of one 
50 ( 2 I^'n format! cl{3 tf& M™*™ of ^ <»her via a 15 to 25 amino acid 

GT? e ?TC C cx£a ' gt^* ^?<?^f e j!^^ r t^^ v ?iS© e ^<5}£'M?©^ t $jfi n '§?fl n yx<^' sru pting antigen binding 
55 or specificity of the binding (Bedzyk et al., 1990; Chaudhary et ai, 1990). These Fvs 
25 (Zfe^ffiW^fl^ ftflim in the heavy and light chains of the native 

A ^t^f AG Sinfle^c^iff^ UC41 gene are 

60 (2TOFA^8N t ¥B^ n i^6 o ef>^^s^sent invention. 

GCA CAG AGC ACA TTC TGG TGA 

65 
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(2) INFORMATION 
TCC CAA AGA AAA 

5 

(2) INFORMATION 
GTT TTC CCA GTC 

10 

(2) INFORMATION 
AGG AAA CAG CTA 

15 

(2) INFORMATION 
AGA CAG TTG GTA 

20 

(2) INFORMATION 
TCA TTA TTG CAT 

25 

(2) INFORMATION 
GTT TTC CCA GTC 

30 

(2) INFORMATION 
AGG AAA CAG CTA 

35 

(2) INFORMATION 
CAA CCA AAC TAT 

40 

(2) INFORMATION 
AGT GGG GAG CCA 

45 

(2) INFORMATION 
GTT TTC CCA GTC 

50 

(2) INFORMATION 
AGG AAA CAG CTA 

55 

(2) INFORMATION 
TTG GTG GCA GTA 

60 

(2) INFORMATION 
GAC AGC TAT TAC 



FOR SEQ ID NO: 138: 
CTA CTA GCC 

FOR SEQ ID NO: 139: 
ACG ACG CTG ATG ATC 

FOR SEQ ID NO: 140: 
TGA CCA TCC AGC AAA 

FOR SEQ ID NO: 141: 
TTT AGG GA 

FOR SEQ ID NO: 142: 
TTT CTG GA 

FOR SEQ ID NO: 143: 
ACG ACG AGC CAT TTT 

FOR SEQ ID NO: 144: 
TGA CCA TGG GCT TCT 

FOR SEQ ID NO: 145: 
TAT GAA ACC G 

FOR SEQ ID NO:146: 
GTG CTG TTA 

FOR SEQ ID NO: 147: 
ACG ACG TTA TAA TAA 

FOR SEQ ID NO: 148: 
TGA CCA TAA TCT TGT 

FOR SEQ ID NO: 149: 
GAC TGT GGT 

FOR SEQ ID NO: 150: 
TCA AAT GTC A 
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ACA GTC TCT AAG 



GTT GTT GTT GGTT 



CCT CTC TCC A 



TTT CCA CTT CAA 



TCA CTA GAG ATA GG 



ATG TTC TCC CAG G 
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(2) INFORMATION FOR SEQ ID NO: 151: 71 

GTT TTC CCA GTC ACG ACG TAA .GAT T GCT ACG CM ACT GT 
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(2) INFORMATION FOR SEQ ID NO: 152: 

AGG AAA CAG CTA TGA CCA TACM^e^Cf^I CMa^ft TAG T 

j q Glucose-Regulated Proteins (GRP94 and GRP78) 

(2) INFORMATION FOR SEQ ID ^15^^^ 

TGG ACA AGT CAA TGC ACT flgj^ Sf , wm Amylo j d A (SAA) 



Troponin 1 (TN I) 
(2) INFORMATION FOR SEQ ID NO: 154: 

Platelet-Derived Growth Factor 

TGA TTT AAG CTG CCC AGA TTT C 



in >~ 

Duchenne Muscular Dystrophy 

(2) INFORMATION FOR SEQ ID NO:155: SV40 
GTT TTC CCA GTC ACG ACG TCT TCT T&'W'gAG AGA ACC T 

Retroviruses 

(2) INFORMATION FOR SEQ ID NO : Papilloma Virus 
AGG AAA CAG CTA TGA CCA TGG AddepAtttigft VjOB CAC AGT 

Human Immunodeficiency Virus 
(2) infor mation for seq id no : ^^fernegalovirus 



GTT TTC CCA GTC ACG ACG ACA GCT ATG AAA TAG AAC AGA G 

(2) INFORMATION FOR SEQ ID NO:158: 
AGG AAA CAG CTA TGA CCA TGC ATA CGT GCA GCA ACA GAG A 

(2) INFORMATION FOR SEQ ID NO: 159: 
TTG GTC TCA GAA ATA ATC TTA CTG G 

(2) INFORMATION FOR SEQ ID NO:160: 
GGA TGT AGC ACC TTG AAA TCA TTC 

(2) INFORMATION FOR SEQ ID NO:161: 
GTT TTC CCA GTC ACG ACG AGC CTA TGG ATG TAT TTA TTC AGT TA 

(2) INFORMATION FOR SEQ ID NO: 162: 
AGG AAA CAG CTA TGA CCA TGT TCC ATT CGT TTC CTA TCA TTA G 

(2) INFORMATION FOR SEQ ID NO: 163: 
GGC AAA AAA ATC AAT AAT ATG 
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(2) INFORMATION FOR SEQ ID NO: 164: 
CAT TGC CCA CCT GTC TAA C 

(2) INFORMATION FOR SEQ ID NO: 165: 
GTT TTC CCA GTC ACG ACG AAG ATT GTT AAA TGC TAC TGC 

(2) INFORMATION FOR SEQ ID NO: 166: 
AGG AAA CAG CTA TGA CCA TTA TCA CTA TTC CCC TTG GC 

(2) INFORMATION FOR SEQ ID NO: 167: 
GGA ATG TGG AGT AAT GTA AAC 

(2) INFORMATION FOR SEQ ID NO:168: 
CAC CAT GTT GAA ATT AAG CAG 

(2) INFORMATION FOR SEQ ID NO: 169: 
GTT TTC CCA GTC ACG ACG GTA ATT GTT GAT AGT CCT CTG 

(2) INFORMATION FOR SEQ ID NO: 170: 
AGG AAA CAG CTA TGA CCA TCA TAA AAC CAA AGC ATC CG 

(2) INFORMATION FOR SEQ ID NO:171: 
ATT TGC TGT CAC ATT ACC CTG 

(2) INFORMATION FOR SEQ ID NO:172: 
CAG CCT GCC TGG GTG ACA G 

(2) INFORMATION FOR SEQ ID NO: 173: 
GTT TTC CCA GTC ACG ACG TGT CAC ATT ACC CTG TTT ATC 

(2) INFORMATION FOR SEQ ID NO: 174: 

AGG AAA CAG CTA TGA CCA TTA AGA AGA GGT GAT ATT ACT TAC 

(2) INFORMATION FOR SEQ ID NO: 175: 
CTA TTG TAA TGA ATG CTG CTG 

(2) INFORMATION FOR SEQ ID NO: 176: 
CAG AAG ATT ATC GTG GTC ATC 
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(2) INFORMATION FOR SEQ ID NO: 177: ^ 

GTT Qfre RKOTiCtec lhsGi^c^ifl3tfed^rff^ift^(Pl^^ pP^siologic signals can permit 
inducible expression of an inhibitory protein. For example, a nucleic acid under control 

(2) INFORMATION FOR SEQ ID NO:l7&: 

of the human PAI-1 promoter results in expression iDducihleJijLtumor necrosis factor. 

AGG AAA CAG CTA TGA CCA TCG TGG TCA TCA TAA ACT AAA TKT 

Additionally any promoter/enhancer combination (as per the Eukaryotic Promoter Data 



5 (2) Ba^Ifi£ffl^^oiaacb^Qs^^ 7 (fi1ve expression of a nucleic acid according to the 
aac preaemAiBVECI1lloaTAUS§Tbfrg , 3T J t ^T7 or SP6 cytoplasmic expression system is another 
15 possible embodiment. Eukaryotic cells can support cytoplasmic transcription from 

(2) INFORMATION FOR SEQ ID NO: 180: rr r 

certain bacterial promoters if the appropriate bacterial polymerase is provided, either as 

AGA TGA GCA GCC CAC TAT TG r 

part of the delivery complex or as an additional genetic expression construct. 

(2) INFORMATION FOR SEQ ID NO: 181: 
10 GTT TTC CCA b G¥c 2 ACG l^SK^f^^^f^^ *W be em P lo y^, in the 

context of the present invention, to regulate the expression of the gene of interest. This 

of all the possible elements involved in the 



20 



25 



30 



(2) INFO W3fiM O^S|g n I^NC|:18 e 3 n : s 

that increase transcription from a promoter 

nr TTTi TTC .CCA GTC ACG ACG CAA CTA TTC ATC TCT TAT CTA CC 

15 01 Motatecrafa distant position on the same molecule of DNA. Enhancers are organized 
35 (2)^^cfc?iW°f^- sift"* tteM$S:Composed of many individual elements, each of 
agg^M e®? e iWSfiBptij90©l paaecm ttg att tc 

40 

(2) inforHwJW^c&sJ^ and promoters" is operational. An 

GA£ n J^ c ^T e 8SS© Wfl tttolcTtlust&e able to stimulate transcription at a distance; this 
4$) need not be true of a promoter region or its component elements. On the other hand, a 

(2) INFORMATION FOR SEQ ID NO: 186: v 

promoter must have one or more elements that direct initiation of RNA synthesis at a 

TAT CTG AAA AAC TAA TAA GCC AG 

particular site and in a particular orientation, whereas enhancers lack these specificities. 



50 



(2f r?fnol^5tld)»nbQ^OT^dBflSft:d^*lapping and contiguous, often seeming to have a 
GTyeiyeimdflrcn^iJlm^^iaafciCTT tct act cag agt cta tg 



55 

25 ( 2 ) iNFOBe&gtOfe f flgt SPQiriPpNMfe, cellular promoters/enhancers, and inducible 

AGprWt^iffl^dE»tlffi^» t£tfstTO Mi^lSfiStiS?l\^i G th G e C nucleic acid encoding 

60 a gene of interest in an expression construct (Table 2 and Table 3). Additionally, any 
(2) INFORMATION FOR SEQ ID TO: 118?: • . J ' ' 

promoter/enhancer combination Us per the Eukaryotic Promoter Data Base EPDB) 

CAG GAT TAT ACT TTC ACT CAA 6 r J > 
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(2) INFORMATION FOR SEQ ID NO: 190: 
GAC ATT TAA CTT AAT TTC ACT TG 

(2) INFORMATION FOR SEQ ID NO: 191: 

GTT TTC CCA GTC ACG ACG ATA GAC TCA AGA AAA ATG CTA AG 
(2) INFORMATION FOR SEQ ID NO:192: 

AGG AAA CAG CTA TGA CCA TCT CCT TGT TAT TTC TAA ACC AG 

(2) INFORMATION FOR SEQ ID NO: 193 : 
TTG TCT ACC TGA ACC CCG AG ? 

(2) INFORMATION FOR SEQ ID NO: 194: 
CAA AAT GGG GCT TGA TTA GG 

(2) INFORMATION FOR SEQ ID NO: 195: 

GTT TTC CCA GTC ACG ACG TAC CTT TCT GTG CGT GAT AGC 
(2) INFORMATION FOR SEQ ID NO: 196: 

AGG AAA CAG CTA TGA CCA. TTT AGG GCT CAA ACT GAA ATG G 

(2) INFORMATION FOR SEQ ID NO:197: 
AGCCATTTTCCTCTCTCCA 

(2) INFORMATION FOR SEQ ID NO: 198: 
GTTTTCCCAGTCACGACGCCACCACATACCACACTTC 

(2) INFORMATION FOR SEQ ID NO: 199: 
CAGAATCGCATCAGTAATAGA 

v 

(2) INFORMATION FOR SEQ ID NO: 200: 
GTTTTCCCAGTCACGACXSTGAAGACCT^ 

(2) INFORMATION FOR SEQ ID NO: 201: 
GAAGCTGTGTTCTTTTTTCA 

(2) INFORMATION FOR SEQ ID NO: 202: 
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(2) INFORMATION FOR SEQ ID NO: 203: 67 

allied 5 ^^(j§Sd^^i^H|Ppi4¥i ibed for antisense polynucleotide. Ribozyme 
sequenci^W^^dy^^oSiffeifm much the same way as described for antisense 
jldlynu^ieW^ L ^r Y Ixamp¥e, A one could incorporate non:Watson-Crick bases, or make 
rWiied^8J^i C 6^fenuc1eotides, or modify the phosphodiester backbone, or modify 
th^-^xgK^fiaigaig^gi the RNA. 

(viii) NAME: splice variant group 1 #• 639. seq 

Alternatively, the antisense ohgo- and polynucleotides according to the present 

inventio|^^ from expression constructs that 

( B ) LOC AT I ON * 

carry nu<j|?jc m& «w«^g1b©ioli^Qprqrol!(«Wcle(aidJls. 2Thflaighou4 Shis application, 
thsi|erroQteft© s i^oToroinniii" isEmeHMNttx delude any type of genetic construct 
10 containinfiaajowtejc aaidTffnicndmgcaaCfflflte^ all of the 

51 GAAATATCTT ACCGAGCACT TTATTCAATC TAAATTTAAA TAGAAGTTTT 

I5f zzffissasa •ssskj'Sssb wssss plasmid 

IS SSEBSIgSS? cells; 

4 01 GTTGCTGACG TTTCTGCCCA AACATCATGA AATGTGGAGA AC-AAAGAACA 

III SSIgSHM 

syWBeti<A^WH^r6pS^ffin^^ phrase 

801 AAATGCTAGA CAATAAAAAA AAAA 

40 "under transcriptional control" means that the promoter is in the correct location and 

(.2 ) o^WWa^filOTeB^r^l^^e^dcid to control RNA polymerase initiation. 

(i) SEQUENCE CHARACTERISTICS: 

4520 Tj^ t^lFofeoi^^^^n^re to refer to a group of transcriptional control 

module^th^.^1 4^red p a^o.ujy^ the initiation site for RNA polymerase D. Much of 
the ^'^pg /T fe y^ J 1 ^ I}^g noters organized derives from analyses of several viral 

(vi) pr 8fii ( G t fN^L iI s^B6c T ^ those for the HSV tn y midine kinase ( tk > 311(1 SV40 

transcript 8fiul NI ^ese H sfudiesfaugmented by more recent work, have shown that 
5^5 pfeMAW^ fitttgUTtf functional ^modules, each consisting of 

a^hxf^^iQ^p^f). of DNA, and containing one or more recognition sites for 

(Bl LOCATION* 

transcripj^aJ^6H v SKN!i t e"^^ 0 ^^ e i^ * s ) : 3 20 28 38 43 45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:204: 
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51 KESQHLCAKY ASFSGKRLVS EEKTVCCLLL TFLPKHHEMW RRKNTQMCPG 
101 DFPFIFKQFF FPLGLTNSIF DLNKSEGKLP IMGNKSSEVA KFPHTQKKR* 



(2) INFORMATION FOR SEQ ID NO: 205: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 850 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: YES 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 



(viii) NAME: splice variant group 1 .# 1407. seq. 

(ix) FEATURE: 

(A) NAME /KEY : TRANSLATION START: 96 

(B) LOCATION: 

(C) EXON COMPOSITION (SEQ ID AS) : 15 20 28 38 43 45 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:205: 

1 GAAAAGTGAA GACCTCTTTG AATTATCTTA TTTCATTTGA CTATGTTCCT 

75. CCTGAGTCAC AAAAAAAGGA TGTTACAGCT ATTTTTCTTA AGCTGATGGG 

125 . CCAAAAGATT GTAGGTCAGA TGAAAGTGAG ATCCATACAT CCTTCTTCAG 

175 CAACTTGTGC CTCTGCTCTG CACCTCCCGC AATTAACTAC TGAAAAAAGA 

225 ACACAGCTTC ACAAAACTAT GGTAACAGTT CCCTTCATTA CTTACTCAAA 

275 CTTCAGAGAG ATAAAAGAGA AGGAGTCACA GCATCTTTGT GCAAAATATG 

325 CCTCGTTTTC TGGGAAAAGG CTTGTTTCAG AAGAGAAGAC AGTGTGCTGC 

375 TTGTTGCTGA CGTTTCTGCC CAAACATCAT GAAATGTGGA GAAGAAAGAA 

425 CACTCAGATG TGCCCTGGAG ACTTTCCTTT TATTTTTAAA CAGTTTTTTT 

475 TTCCATTGGjG TTTGACCAAC TCTATATTCG ACTTGAACAA ATCCGAAGGA 

525 AAGCTTCCAA TTATGGGGAA CAAGTCCTCT GAAGTGGCTA AATTCCCACA 

575 CACACAAAAG AAAAGATGAG TCTATGCCCA GGACCACCAG ATAATTGAGT 

625 CCTGTACAAA AGCTTCTGAC TAAACAATGT GCTCTGGCTC AG G AC TAT AC 

675 AGAGAAAAGA CACAGTTTTT AAATTGATCG TTCAAAAGGA AACATATTTA 

725 TGATATTTGC TCCATGATAT GTATCTCTCA TCTGTTAGCT CAGGCAGAAT 

775 TAAAATGCTA GACAATAAAA AAAAAA 

(2) INFORMATION FOR SEQ ID NO:206: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 157 amino acids 

(B) TYPE: peptide v 
(ii) MOLECULE TYPE: peptide 

(iii) HYPOTHETICAL: YES 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: splice variant group 1 # 1407. pep 

(ix) FEATURE: 

(A) NAME /KEY: 

(B) LOCATION: m e 

(C) EXON COMPOSITION (SEQ ID #S) : 15 20 28 38 43 45 • 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:206: 
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1 MGQKIVGOMK VRSIHPSSAT CASALHI£pL TTEKRTQLHK TMVTVPFITY 
51 SNFREIKEKE SQHLCAKYAS FSGKRLVSEE KTVCCLLLTF LPKKHEMWRR 
101 KNTGMCRGDF PFIFKQFFFP LGLTNSIFDL NKSEGKLPIM GNKSSEVAKF 

I^^TeHRR^y* detrimental non-specific inhibition of protein synthesis also can be 
5 measured by determining target cell viability in vitro. 

(2) INFORMATION FOR SEQ ID NO:207: 

(i) se&5e»8s3 l^TOCtfaeR^SCe©mplementary M or "antisense" mean polynucleotides 
10 (A) LENGTH: 980 base pairs 

that ai^^l^$^iallyaeDalpdeHTcirtary over their entire length and have very few base 

5 (ga^nwtflteouLEFwBEtarapte^ sequences of fifteen bases in length may be termed 

15 ( iOQipptefPOtntex iwhfiD ttey have a complementary nucleotide at thirteen or fourteen 

( vi ) TOdcotirits eotJBf tfifteen. Naturally, sequences which are "completely complementary" 
(A) ORGANISM: Homo sapiens 
will be sequences which are entirely complementary throughout their entire length and 

20 (viii) NAME: splice variant group 2 # 129r.seq 

have no base mismatches, 
(ix) FEATURE: 

(A) NAME /KEY : TRANSLATION START: 168 

25 10 Icfl^^ m <*>ptemplated. For 

S^i^^^c^^ re S ions of hi 8 h homology, but also 

contains a non-homologous region (e.g., a ribozyme) could be designed. These 

iMfrr fl a6eS»iS3 , tW^rCftftfiftgAGAA ACAACAACGA ACTAGATGAG AATGAGAGGA 
151 TctSIaA ^GGG?8ft-GATG AAAGTGAGAT CCATACATCC TTCTTCAGCA ■ 
201 ACTTGTGCCT CTGCTCTGCA CCTCCCGCAA TTAACTACTG AAAAAAGAAC 

3515 III «»«S»,f»^^^^Jn gl bcDNA 
^H'SgIt^g^t^cc^ ASfeflftfiAAtegments, or 

40 ' '1§i on §S , S§TT d ^ffiA^'f §» St§K^o5KSa5& (ttcte&dsaaa. Although 

6J5I GCXXCCAATI ATJ3GGGAACA AGTCCTCTGA AGTGGCTAAA TTCCCACACA 

s M er ^OT^A8"M^G?^ KAfi¥feft»fr immmm<m><£WJ68sSmty, numerous 

fiM r TCIGGATT-CA CTAAAACTAT ACTCTTAATC TGTACCGGGA AGTTCTATAC 

°W tylfftS'r T$K°lm^wtiPWm§ ifo^ci&D^ite&^mi&fiPoT example, 

45™ L 75i , GACAATCTGC .CTGCAACACT CATAATCCCT GAATGGATAG CACCCAATAA 

20 b 9Bl "ffi^AG^Wc^rcc^^^^TTWifiatFTaiifriA^ Qfl^$5iHfei63lide to its- 

851, CTTTATACCT TATTTGACCT QAATATCTTA TCTTTCCGTG TTAAATGCTC 

c W ,e WcWcT^^c<!^fmT^»fcT^ that 

50 ofilonucioFftes o^,^^7l"ft,^ J4 J 15^1 6, 1 7, 1 8, 1 9, 20, 25, 30, 35, 40, 45, 50 

(2) ^S&tecflP'^bW^iS <&<hoY8#i all or part of the gene sequence may be employed 

in (i^ e ^56li^g f c^^t*rt^^t ion . statistically, any sequence of 14 bases long 
55„ e . ., (A) LENGTH: 1.93 amino acids 

25 should ofep^goipspKiitte human genome and, therefore, suffice to specify a unique 

ta i8.^ $89tt§68££ TYPE: peptide 

60 (iii) HYPOTHETICAL: YES 

In certain embodiments, one may wish to employ antisense constructs which 

(vi). ORIGINAL SOURCE: 

include ptheoBteiwesM; fltomsxarapfenfhose which include C-5 propyne pyrimidines. 
•65 (viii) NAME: splice variant group 2 # 1291. pep 



IM§B86l&J«f8=8MW:t? 
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(ix) FEATURE: 

(A) NAME /KEY : 
• (B) LOCATION: 

(C) EXON COMPOSITION (SEQ ID «S) : 3 20 28 38 43 47 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 208: 

1 MKVRSIHPSS ATCASALHLP QLTTEKRTQL HKTMVTVPFI TYSNFREIKE 

51 KESQHLCAKY ASFSGKRLVS EEKTVCCLLL TFLPKKHEMW RRKNTQMCPG 

101 DFPFIFKQFF FPLGLTNSIF DLNKSEGKLP IMGNKSSEVA KFPHTQKKRC 

151 LQSVWNSDYY PSGFTKTILL ICTGKFYTVA LILGAVYFNQ DTP* 



(2) INFORMATION FOR SEQ ID NO:209: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1007 base pairs 

(B) TYPE: nucleic acid 

(ii) MOLECULE TYPE: cDNA 

(iii) HYPOTHETICAL: YES 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(viii) NAME: splice variant group 2 # 2843. seq 

(ix) FEATURE: 

(A) NAME/KEY: TRANSLATION START: 96 

(B) LOCATION: 

(C) EXON COMPOSITION (SEQ ID #S) : 15 20 28 38 43 47 

(xiT SEQUENCE- DESCRIPTION: SEQ ID. NO: 209 : 

1 GAAAAGTGAA GACCTCTTTG l AATTATCTTA TTTCATTTGA CTATGTTCCT... 

125 CCTGAGTCAC AAAAAAAGGA TGTTACAGCT ATTTTTCTTA AGCTGATGGG • ' 

17 5 CCAAAAGATT GTAGGTCAGA TGAAAGTGAG ATCCATACAT CCTTCTTCAG 

225 CAACTTGTGC CTCTGCTCTG CACCTCCCGC AATTAACTAC TGAAAAAAGA 

275 ACACAGCTTC ACAAAACTAT GGTAACAGTT CCCTTCATTA CTTACTCAAA 

325 CTTCAGAGAG ATAAAAGAGA AGGAGTCACA GCATCTTTGT GCAAAATATG 

375 CCTCGTTTTC TGGGAAAAGG CTTGTTTCAG AAGAGAAGAC AGTGTGCTGC 

425 TTGTTGCTGA CGTTTCTGCC CAAACATCAT GAAATGTGGA GAAGAAAGAA 

4 75 CACTCAGATG TGCCCTGGAG ACTTTCCTTT TATTTTTAAA CAGTTTTTTT 

525 TTCCATTGGG TTTGACCAAC TCTATATTCG ACTTGAACAA ATCCGAAGGA 

575 AAGCTTCCAA TTATGGGGAA CAAGTCCTCT GAAGTGGCTA AATTCCCACA 

625 CACACAAAAG AAAAGGTGCC TGCAATCTGT CTGGAATTCA GACTATTACC 

675 CTTCTGGATT CACTAAAACT ATACTCTTAA TCTGTACCGG GAAGTTCTAT 

725 ACTGTTGCTT TAATCCTTGG TGCTGTTTAT TTCAACCAAG ATACACCTTG 

775 AAGACAATCT GCCTGCAACA CTCATAATCC CTGAATGGAT AGCACCCAAT 

825 AAAAGAGAGA ACATCATTGC CAATTTTTTC TTCTATCCAA GTCCTCCTCT 

875 GTCTT ( TATAC CTTATTTGAC CTGAATATCT TATCTTTCCG TGTTAAATGC 

925 TCTTCTCTCT TGACCTCTAG ATCACTGTAC TCTCTGGGGT CTGCCTCTGT 

975 TTCCCTAATA TTTTCCGCTT TAAATTGTCC ACA 



(2) INFORMATION FOR SEQ ID NO: 210: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 201 amino acids 

(B) TYPE: peptide 

(ii) MOLECULE TYPE: peptide 
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(iii) HYPOTHETICAL: YES 

63 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapi ns 
5 Inhibitors could potentially be designed for UC41. This is complicated by the 

(viii) NAME: splice variant group 2 # 2843. pep 

fact that no specific function has been identified for this gene products, and no data is 

(ix) FEATURE: 

available«an itBAte6Ktfifnensional structures. 

10 (B) LOCATION: 

(C) EXON COMPOSITION (SEQ ID IS) : 15 20 28 38 43 47 

(xd lisS in some cases ' from the 

1 5 5 primary sec^ unknown 

18IS» f P«^«ffi4m?M«»o -cur in 
laijg^ of the 

20 serine proteases, like trypsin and chymotrypsin, have extensive sequence homologies 

and relatively similar three-dimensional structures. Other general categories of 

10 homologous proteins include different classes of transcriptional factors, membrane 
receptor proteins, tyrosine kinases, GTP-binding proteins, etc. The putative amino acid 
sequences encoded by the prostate specific gene of the present invention may be cross- 
checked for sequence homologies versus the protein sequence database of the National 
Biomedical Research Fund. Homology searches are "standard techniques for the skilled 

15 practitioner. 

Even three-dimensional structure may be inferred from the primary sequence 
data of the encoded protein(s). Again, if homologies exist between the encoded amino 
acid sequences and other proteins of known structure, then a model for the structure of 
the encoded protein may be designed, based upon the structure of the known protein. 

20 An example of this type of approach was reported by Ribas de Pouplana and Fothergill- 
Gilmore (1994). These authors developed a detailed three-dimensional model for the 
structure of Drosophila alcohol dehydrogenase, based in pan upon sequence homology 
with the known structure of 3-a, 20-B-hydroxysteroid dehydrogenase. Once a three- 
dimensional model is available, inhibitors may be designed by standard computer 

25 modeling techniques. This area has been reviewed by Sun and Cohen (1993), herein 
incorporated by reference. 
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torW^UlHi^he^iheT^M^eT^^ dNaffiSBfn^bjflls. 
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Le,,erS Tne . ] 3^^^!!^^i^S&r£fS% \g2rif l»«im in maligna 
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for the presence of the expression products of UC41. Such samples < 
needle biopsy cores, surgical resection samples, lymph node tissue, or s< 
embodiments, nucleic acids would be extracted from these samples a 
described above. Some embodiments would utilize kits containing pre 
pairs or hybridization probes. The amplified nucleic acids would be 
expression products by, for example, gel electrophoresis and ethidium b 
3r Southern blotting, or a solid-phase detection means as described 
methods are well known within the art. The levels of expression pre 
would be compared with statistically valid groups of metastatic 
malignant, benign or normal prostate samples. The diagnosis and | 
individual patient would be determined by comparison with such group* 

Another embodiment of the present invention involves app 
pr'pTM tprhntnnpc tn H^rprt rii-mlntine nrostate cancer cells (/.£., those 
metastasized), using probes and primers selected from sequences or th 
designated herein as SEQ ID NO:l, SEQ ID NO:3 or SEQ ID NO:4. S 
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relative quantitative RT-PCR™ with an internal standard in which the internal standard 
is an amplifiable cDNA. fragment that is larger than the target cDNA fragment and in 
which the abundance of the mRNA encoding the internal standard is roughly 5-100 fold 
higher than the mRNA encoding the target, this assay measures relative abundance, not 
5 absolute abundance of the respective mRNA species. 

Other studies described below were performed using a more conventional 
relative quantitative RT-PCR™ with an external standard protocol. These assays 
sample the PCR™ products in the linear portion of their amplification curves. The 
number of PCR™ cycles that are optimal for sampling must be empirically determined 

10 for each target cDNA fragment. In addition, the reverse transcriptase products of each 
RNA population isolated from the various tissue samples must be carefully normalized 
for equal concentrations of amplifiable cDNAs. This is very important since this assay 
measures absolute mRNA abundance. Absolute mRNA abundance may be used as a 
measure of differential gene expression only in normalized samples. While empirical 

15 determination of the linear range of the amplification curve and normalization of cDNA 
preparations are- tedious and time consuming processes, the resulting RT-PCR™ assays 
may be superior to those derived from the relative quantitative RT-PCR™ with an 
internal standard. 

One reason for this is that without the internal standard/competitor, all of the 
20 reagents may be converted into a single PCR™ product in the linear range of the 
amplification curve, increasing the sensitivity of the assay. Another reason is that with 
only one PCR™ product, display of the product on an electrophoretic gel or some other 
display method becomes less complex, has less background and is easier to interpret. 

4.7 Diagnosis and Prognosis of Human Cancer 

25 In certain embodiments, the present invention allows the diagnosis and 

prognosis of human prostate cancer by screening for prostate specific nucleic acids, 
particularly those that are overexpressed in prostate cancer. The field of cancer 
diagnosis and prognosis is still uncertain. Various markers have been proposed to be 
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